Using SPSS 1. What (who?) is SPSS? SPSS (Statistical Package for the Social Sciences) is a powerful set of computer programs for calculating statistics (an thus for saving you innumerable hours of calculating using pencil and paper). The program is available on many personal computers (PCs) around campus, including those in 321 Snedecor Hall. (One should soon be able to find other locations at the following web site: http://www.it.iastate.edu/labsdb/) In these few pages you will learn only a few commands for running SPSS. You will learn some more commands as you are given SPSS programs to run during the course of the semester. 2. Entering data into SPSS Like any program on a PC, it is started by double-clicking on the SPSS icon on the computer's desktop. This opens the "Data Editor" in SPSS. Here you can enter data manually (not advised), import data, or type in (and run) a batch program that contains your data. Below are instructions on how to enter data into SPSS using the latter two methods. 1 a. Importing data from an SPSS portable (with *.por extension) or SPSS systems (with *.sav extension) file. During the semester you will be provided with SPSS portable files that contain data for you to analyze as part of your lab problems. You can import these files into SPSS by (1) selecting "Read text data" from the "File" menu, (2) changing "Files of type" to "SPSS Portable (*.por)," (3) changing "Look in" to the folder into which you have copied the file, and (4) double-clicking on the file name. (Directions are the same for a systems file, except in that "Files of type" should be changed to "SPSS (*.sav).") b. Using SPSS in "Batch Mode" Starting with the first lab, you will find yourself using SPSS "batch programs" in doing your lab assignments. To complete the assignments you will need to run these programs and print the output (that appears in an SPSS Viewer window). WARNING: SPSS will only print parts of your output that you have highlighted in the left tree-pane of the SPSS Viewer window. Pressing the button with the printer on it will not do anything if nothing has been highlighted. Batch programs must be typed using the SPSS Syntax Editor. Open the editor by selecting "File" then "New" then "Syntax." Next, type in your batch program exactly as it appears in the lab. Then select "Edit," "Select All," and push the "Run Current" button (i.e., the button with the right-pointing black arrowhead on it). This will execute your batch program and send the output to the SPSS Viewer window (where you may also find error messages if you mistyped parts of your program). WARNING: It is strongly recommended that you use a fixed (i.e., NOT proportional) font when typing your batch programs. Any Courier font should do the trick. Fixed fonts align your numbers in straight columns, so that it is easer to tell if you have left too many spaces between them. 3. Why do batch programs? Maybe you have used SPSS on a PC before, or maybe you have had a chance to play around with the program a bit. In either case, you have probably discovered that once you have imported data into SPSS, you can analyze these data entirely with mouse movements and clicks. What could be easier? Batch programs just slow you down. Who needs them, right? The answer lies in the fact that you can save (and rerun, if necessary) your batch programs, whereas it is not as easy to recreate the mouse activities that generated your output. Just imagine in a few years when you proudly take your results to your major professor and he exclaims, "These results are hard to believe! How did you get these numbers?" Now imagine going back to your computer and discovering that no matter how much you and your mouse try, you cannot recreate the numbers on your output. On the other hand, if the numbers appeared in Table 23 and you search your PC for a file called "table23.sps," just a quick run of this batch program will (assuming that no one has manipulated your data or program file) generate an exact replica of the output that your major professor seeks. In a sentence, batch programs are simply part of a good 2 record keeping strategy. In reviewing the literature, competent researchers always keep careful records of their sources, right? Well, when performing a statistical analysis, only the incompetent fail to keep just as meticulous records of their programs. Enough said. 4. Writing a program a. Computer programs must be written before they can be run. The first line in an SPSS program always indicates what data are to be analyzed. In the course, you will either access an existing data set (a.k.a., a systems file) or you can enter data by hand. 1) Our class will analyze data from a survey of "Wilson Scholars" and a national survey of U.S. adults. On the Assignmentspage of our class website, a setup file can be accessed via the page’s ‘Recall’ link. Running this program from an SPSS syntax window will create a systems file (named, recall.sav) in the ‘temp’ directory of the ‘C:’ drive on your PC. To access the data in this file, your program must begin with the following line: get file='c:recall.sav'. Later in the course we shall be using another systems file (named, gss96.sav) that you will access using the following as the first line in your programs: get file='c:gss96.sav'. 2) If you enter data into your SPSS program by hand, you can do so by placing a "data list" statement rather than a "get file" statement in its first line. What follows is an illustration of an SPSS program in which data have been included: data list records=1 / income 1-2 degree 3-4 wt 5-6. weight by wt. begin data. 1 1 8 2 1 1 3 1 1 1 2 2 2 2 7 3 2 3 2 3 2 3 3 6 end data. regression vars=income,degree/dep=income/enter. compute ysq = income**2. compute xsq = degree**2. compute xy = degree * income. frequencies vars=income ysq degree xsq xy / statistics=mean. compute ssx = (degree - 1.933)**2. compute yhat = .593284 + (.727612 * degree). compute ssregres = (yhat - 2.0)**2. 3 compute sserror = (income - yhat)**2. compute sstotal = (income - 2.0)**2. frequencies vars=ssx yhat ssregres sserror sstotal / statistics=mean. In writing batch programs, be sure that each line (except ones with data) ends with a period. Also notice how the "data list" statement indicates that there is one line of data (records=1) for each unit of analysis, that data on the variable, ‘income’, appear right-justified in the first two columns, data on the variable, ‘degree’, appear rightjustified in columns 3 and 4, and data on the variable, ‘wt’, appear right-justified in columns 5 and 6. When entered by hand, the data in an SPSS program are placed between a "begin data" and an "end data" statement, and may be preceded or followed by data transformation statements (e.g., compute, recode, if, etc.) and followed by commands (e.g., plot, frequencies, etc.). Note: "Commands" generate output, "statements" merely create or modify variables. The "weight" statement indicates how many respondents had a specific pair of scores on the income and degree variables. This is a compact way of listing data (usually from tables) in which many respondents have identical combinations of scores. Most commonly, each line of data in one's program corresponds to a single respondent. In such cases each line has a weight of one (1), and the weight statement is not required. b. You will be given programs such as this when they are required in your lab assignments. The following common SPSS commands and statements are included to help you understand the various parts of these programs: 1) Occasionally you may wish to get a box plot of your data. To obtain two box plots, one with ‘recall’ on the vertical axis and ‘birthyr’ on the horizontal axis and another with ‘recall on the vertical axis and ‘eventyr’ on the horizontal axis, you would use the following batch program: get file='c:recall.sav'. examine vars=recall by birthyr,eventyr/plot=boxplot/ statistics=none/nototal. 2) The "compute" statement is used to create a new variable as a mathematical function of other variables. For example, imagine that you found a constant (or intercept) of .593284 and a slope of .727612 in the regression of ‘degree’ on ‘income’. You could compute a new variable, ‘incomhat’, that gives the estimated values of ‘income’ (according to this regression) for each possible combination of degree and income as defined in the previous program (i.e., the one with the data list statement). This is done as follows: compute incomhat = .593284 + (.727612 * degree). 3) The "recode" statement is used to change values on a variable. For example, you might find that you have too few cases per year-of-birth to do a box plot with a separate box for each birth year. Consequently, you might collapse data on a 4 corresponding variable (let’s call it ‘birthyr’) such that each of the variable’s collapsed values has at least 40 cases: recode birthyr(28,32=30)(34,35=34.5)(36,37=36.5)(38,39=38.5) (40,41=40.5)(42,43=42.5)(44,45=44.5)(46,47=46.5)(48,49=48.5). Recode statements can also be used to assign units to a variable. For example, consider a variable, ‘rincome,’ that takes a value of 1 for incomes less than $1000, 2 for incomes between $1000 and $2999, 3 for $3000 to $3999, 4 for $4000 to $4999, 5 for $5000 to $5999, 6 for $6000 to $6999, 7 for $7000 to $7999, 8 for $8000 to $9999, 9 for $10000 to $14999, 10 for $15000 to $19999, 11 for $20000 to $24999, 12 for $25000 or more, and 13 for refused to respond. These values could be recoded (approximately) into dollar units with the following recode command: recode rincome (1=500)(2=2000)(3=3500)(4=4500)(5=5500)(6=6500) (7=7500)(8=9000)(9=12500)(10=17500)(11=22500)(12=35000)(13=99). Note that values are changed to the midpoints (in dollars) of their corresponding intervals. The choice of $35,000 as the midpoint of the highest income category is, admittedly, somewhat arbitrary. Also, since 99 is the missing data code for ‘rincome’, the last parentheses in the recode set the incomes of refusers to missing (rather than not recoding it and in-so-doing assuming their average income to equal $13.00). 4) The "if" statement can be used to combine information from different variables. This is particularly useful when one has contingency items. For example, consider the two contingency items ‘hit’ ("Have you ever been punched or beaten by another person?") and ‘hitage’ ("Did you experience this beating (or these beatings) as a child, as an adult, or in both childhood and adulthood?"). Imagine that a score of 1 on ‘hit’ means "yes" and a score of 2 means "no" and that a score of 1 on ‘hitage’ means "as a child," of 2 means "as an adult," and of 3 means "both." If you wanted a variable that measured whether respondents were beaten as children, you might wish to change ‘hit’ such that a score of 1 would mean "hit as a child" and 2 would mean "not hit as a child." This can be done by changing scores on ‘hit’ from 1 to 2 among respondents who were beaten as an adult by not as a child. This could be done with the following "if" statement: if ( hitage eq 2 ) hit = 2. Note: Logical operators other than "eq" (equals) are "ne" (does not equal), "lt" (less than), "gt" (greater than), "le" (less than or equal to), and "ge" (greater than or equal to). These relations can also be combined with "and" and "or". Consider the following illustration: if ( ( ( var1 ne 0 ) and ( var2 eq 1) ) or ( var3 ge 20 ) ) var4 = 0. 5) The "select if" statement allows one to restrict an analysis to part of one's data set. 5 Thus, the following statement would restrict an analysis to women only: select if ( sex eq 2 ). 6) All statements (e.g., with "compute", "recode", "if", "select if", etc.) that follow a "temporary" statement apply only to the next following command. For example, if you wished to find the mean and variance on "age" separately for males and females, this would be done as follows: temporary. select if ( frequencies temporary. select if ( frequencies sex eq 1 ). general = age / statistics = mean,variance. sex eq 2 ). general = age / statistics = mean,variance. 5. And in closing . . . You are strongly encouraged to use SPSS (or R or SAS) to do your homework instead of doing your homework problems via hand calculations. If you do this, you must hand in your entire computer outputs in lieu of your hand calculations, however. 6