SPSS Sytax Tutorial Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objectives The goal of this course is to give students who are already familiar with SPSS a basic understanding of SPSS syntax. Course Outline Opening datasets that are in Excel, SPSS, and other formats Running descriptive statistic, frequencies, crosstabs, and correlations Transforming variables using recode and compute Splitting Files Selecting Cases Hypothesis testing: T-Test and Regression Merging datasets The tasks covered in this tutorial will be familiar to you if you’ve completed our other SPSS tutorials. However, this time we will be discussing how to use SPSS’ syntax editor. After each task, we will post the syntax code. SPSS’ Syntax editor can be accessed by going to File>New>Syntax Tip: If you’d like to add your work to a syntax file, use the paste button. 1 SPSS’ Syntax Window Operations can be triggered directly from the syntax window by clicking on the blue “play button” in the toolbar. Opening a dataset If you have an SPSS dataset (*.sav), you can open it in the following way: Select the File > Open > Data. A dialog box pops up. Browse for your dataset and open it. Ex) “World 95.sav” (Let’s use this data for the rest of tutorial.) Syntax programming GET FILE='C:\Program Files\SPSS\Employee data.sav'. DATASET NAME DataSet2 WINDOW=FRONT. 2 If you have a dataset which is not in SPSS data format, it is easy to open it in SPSS. Go to File > Open > Data. A dialog box pops up. In the line that specifies “Files of Type,” change the file type from SPSS to “all files.” - Browse for your dataset and open it. Ex) “demo.xls” Syntax programming GET DATA /TYPE=XLS /FILE='C:\Program Files\SPSSInc\SPSS16\Samples\demo.xls' /SHEET=name 'demo' /CELLRANGE=full /READNAMES=on /ASSUMEDSTRWIDTH=32767. DATASET NAME DataSet2 WINDOW=FRONT. Running descriptive statistics and frequencies Descriptive statistics Go to Analyze > Descriptive Statistics > Descriptives Select the variables for which you want the descriptives. Ex) “aids cases” and “fertility”. Then click on the Options button and check any boxes you want to apply such as “variance,” “skewness,” or “Descending means.” Then click OK. The results will be displayed in the Output Editor. Syntax programming: VARIABLES DESCRIPTIVES VARIABLES=aids fertility /STATISTICS=MEAN STDDEV MIN MAX. Frequencies Go to Analyze > Descriptive Statistics > Frequencies Select the variables for which you want the frequencies. Ex) “predominant climate” and “people who read” Click on the Statistics button, check any boxes you want to apply such as “mode” or “mean,” and then click on the Continue button. 3 Click on the Charts button, check a graph type you want to apply such as “histogram” (let’s also add a normal distribution curve by checking the box) and then click on the Continue button. Click on the Format button, check any options you want to apply such as “descending values,” and then click on the Continue button. Then click OK. The results will be displayed in the Output Editor. Syntax programming FREQUENCIES VARIABLES=climate literacy /STATISTICS=MEAN MODE /HISTOGRAM NORMAL /FORMAT=DVALUE /ORDER=ANALYSIS. Crosstabulation Go to Analyze > Descriptive Statistics > Crosstabs. Select the variables for which you want the crosstabs. Ex) “predominant climate” in column and “predominant religion” in row. Click on the Statistics, check options such as “Chi-square,” and click Continue. Click on the Cells button, check options like “expected” and “percentage > total,” and click Continue. Then click OK. The results will be displayed in the Output Editor. Syntax programming CROSSTABS /TABLES=religion BY climate /FORMAT=AVALUE TABLES /STATISTICS=CHISQ /CELLS=COUNT EXPECTED TOTAL /COUNT ROUND CELL. Correlation Go to Analyze > Correlate. Select Bivariate, if you are interested only in the relationships between two variables, or Partial, if you are measuring the association between two variables but want to factor out the effect of other variables. Select the variables for which you want the correlation. Ex) “Gross domestic product” and “people who read” Then click OK. The results will be displayed in the Output Editor. Syntax programming: CORRELATIONS /VARIABLES=gdp_cap literacy /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE 4 Transforming variables Recoding Data The Recode option allows you to change the coding of variables and create discrete categories from continuous variables. Go to: Transform > Recode into different variables. Select “GDP per capita” for the “Numeric variable” box. Click “Old and New Values.” Select “range: LOWEST through” box. Enter “10000”. In the new values, enter 1. Click add. Select “range: value through HIGHEST” box. Enter “10001”. In the new values, enter 0. Click add. Click Continue. In “output variable,” give the name as “lowincome” and the label as “country has per capita income less than 10,000.” Click change. Click OK. Syntax programming RECODE gdp_cap (Lowest thru 10000=1) (10001 thru Highest=0) INTO lowincome. VARIABLE LABELS lowincome 'country has per capita income less than 10,000'. EXECUTE. Transforming Data The Compute option allows you to arithmetically combine or alter variables and place the resulting value under a new variable name. Go to: Transform > Compute variable. Give a new variable name, for example, “lnGDP” under Target Variable. In the “function group” box, select “Arithmetic” by double clicking on it. In the “Functions and Special Variables” box, select “Ln” by double clicking on it. Highlight “Gross domestic product” and put it in the expression box by clicking the arrow. Click OK Syntax programming COMPUTE lnGDP=LN(gdp_cap). EXECUTE. 5 Splitting files The Split File option allows you to compare subgroups or organize output by subgroups. Let’s organize output by subgroups. Data > Split file > Compare groups. Select the button by “Organize output by groups”. Highlight a variable you want, for example, “country has per capita income less than 10000” or any categorical variable. Click OK. Then do some analysis (for example, run a correlation between literacy and aids rate). The output will show you the results of each group side by side. Syntax programming COMPUTE lnGDP=LN(gdp_cap). EXECUTE. SPLIT FILE OFF. SORT CASES BY lowincome. SPLIT FILE SEPARATE BY lowincome. COMPUTE lnGDP=LN(gdp_cap). EXECUTE. SPLIT FILE OFF. SORT CASES BY lowincome. SPLIT FILE SEPARATE BY lowincome. CORRELATIONS /VARIABLES=literacy aids /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE. To turn this off, go to Data > Split file > Compare groups and click “Analyze all cases, do not create groups. Then select OK. Syntax programming SPLIT FILE OFF. USE ALL. Selecting cases The Select cases option allows you to analyze only one of the subgroups of your interest. Go to Data > Select cases. Check “if conditions is satisfied”. Click on the if button. Highlight a variable you want, for example, “country has per capita income less than 10000” or any categorical variable, and put it in the expression box by clicking the arrow. Then specify a subgroup by writing an expression you want. Ex) lowinc=1 6 Then do some analysis (for example, run a correlation between literacy and aids rate. The output will show only the results of the selected group. To turn off the selection: Go to Data > Select cases. Check “All cases”. Click OK. Syntax programming: COMPUTE filter_$=(lowincome=1). VARIABLE LABEL filter_$ 'lowincome=1 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE. CORRELATIONS /VARIABLES=literacy aids /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE. FILTER OFF. USE ALL. EXECUTE. Hypothesis testing with T-Tests There are three types of t-tests: One-Sample T test to compare a single sample with a population value, Independent-Samples T test to compare two groups’ scores on the same variable, and Paired-Sample T test to compare the means of two variables within a single group. To compare the means of two variables within a single group, Go to Analyze > Compare means > Paired-sample t-test. Pick “average female life expectancy” and “average male life expectancy” for variable 1 and 2. Put the variables in the box by clicking the arrow. Click OK. Syntax programming T-TEST PAIRS=lifeexpf WITH lifeexpm (PAIRED) /CRITERIA=CI(.9500) /MISSING=ANALYSIS. Regression Analysis To run a linear regression go to Analyze > Regression > Linear. 7 Select your dependent variable (females who read) and your independent variables (average female life expectancy, infant mortality, birth rate, Gross domestic product) from the list on the left. Then click OK. Syntax programming REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT lit_fema /METHOD=ENTER lifeexpf babymort birth_rt aids_rt gdp_cap. Match merging data files You can merge data from two files in two different ways. You can: Merge the active dataset with another open dataset or SPSS Statistics data file containing the same variables but different cases. Merge the active dataset with another open dataset or SPSS Statistics data file containing the same cases but different variables. To Merge Files go to Data>Merge Files > Select Add Cases or Add Variables Add Cases: merges the active dataset with a second dataset or external SPSS Statistics data file that contains the same variables (columns) but different cases (rows). Add Variables merges the active dataset with another open dataset or external SPSS Statistics data file that contains the same cases (rows) but different variables (columns). For example, you might want to merge a data file that contains pre-test results with one that contains post-test results. Cases must be sorted in the same order in both datasets. To Sort Cases go to Data> Sort Cases> Select one or more sorting variables. The data file is sorted based on the values of the sorting variables. If you select multiple sorting variables, cases are sorted by values of each variable within categories of the prior variable on the Sort list. If one or more key variables are used to match cases, the two datasets must be sorted by ascending order of the key variable(s). 8 Variable names in the second data file that duplicate variable names in the active dataset are excluded by default because Add Variables assumes that these variables contain duplicate information. Syntax Programming Using command syntax entails first sorting and saving files to be merged and then matching. Below are the commands for merging: GET FILE="dataset_A1.sav". SORT CASES BY varname. SAVE OUTFILE="dataset_A2.sav". GET FILE="dataset_B1.sav". SORT CASES BY varname. SAVE OUTFILE="dataset_B2.sav". MATCH FILES FILE=" dataset_A2.sav " /FILE=" dataset_B2.sav " /BY varname. LIST. Using command syntax, you can merge up to 50 datasets and/or data files. 9