SPSS Sytax Tutorial - American University

advertisement
SPSS Sytax Tutorial
Social Science Research Lab
American University, Washington, D.C.
Web. www.american.edu/provost/ctrl/pclabs.cfm
Tel. x3862 Email. [email protected]
Course Objectives
The goal of this course is to give students who are already familiar with SPSS a
basic understanding of SPSS syntax.
Course Outline
Opening datasets that are in Excel, SPSS, and other formats
Running descriptive statistic, frequencies, crosstabs, and correlations
Transforming variables using recode and compute
Splitting Files
Selecting Cases
Hypothesis testing: T-Test and Regression
Merging datasets
The tasks covered in this tutorial will be familiar to you if you’ve completed our
other SPSS tutorials. However, this time we will be discussing how to use SPSS’
syntax editor. After each task, we will post the syntax code.
SPSS’ Syntax editor can be accessed by going to File>New>Syntax
Tip: If you’d like to add your work to a syntax file, use the paste button.
1
SPSS’ Syntax Window
Operations can be triggered directly from the syntax window by clicking on the
blue “play button” in the toolbar.
Opening a dataset
If you have an SPSS dataset (*.sav), you can open it in the following way:
Select the File > Open > Data. A dialog box pops up.
Browse for your dataset and open it. Ex) “World 95.sav” (Let’s use this data for
the rest of tutorial.)
Syntax programming
GET FILE='C:\Program Files\SPSS\Employee data.sav'. DATASET NAME
DataSet2 WINDOW=FRONT.
2
If you have a dataset which is not in SPSS data format, it is easy to open it in
SPSS.
Go to File > Open > Data.
A dialog box pops up. In the line that specifies “Files of Type,” change the file
type from SPSS to “all files.”
- Browse for your dataset and open it. Ex) “demo.xls”
Syntax programming
GET DATA /TYPE=XLS /FILE='C:\Program
Files\SPSSInc\SPSS16\Samples\demo.xls' /SHEET=name 'demo'
/CELLRANGE=full /READNAMES=on
/ASSUMEDSTRWIDTH=32767. DATASET NAME DataSet2 WINDOW=FRONT.
Running descriptive statistics and frequencies
Descriptive statistics
Go to Analyze > Descriptive Statistics > Descriptives
Select the variables for which you want the descriptives. Ex) “aids cases” and
“fertility”. Then click on the Options button and check any boxes you want to
apply such as “variance,” “skewness,” or “Descending means.”
Then click OK. The results will be displayed in the Output Editor.
Syntax programming:
VARIABLES DESCRIPTIVES VARIABLES=aids fertility
/STATISTICS=MEAN STDDEV MIN MAX.
Frequencies
Go to Analyze > Descriptive Statistics > Frequencies
Select the variables for which you want the frequencies. Ex) “predominant
climate” and “people who read”
Click on the Statistics button, check any boxes you want to apply such as
“mode” or “mean,” and then click on the Continue button.
3
Click on the Charts button, check a graph type you want to apply such as
“histogram” (let’s also add a normal distribution curve by checking the box) and
then click on the Continue button.
Click on the Format button, check any options you want to apply such as
“descending values,” and then click on the Continue button.
Then click OK. The results will be displayed in the Output Editor.
Syntax programming
FREQUENCIES VARIABLES=climate literacy /STATISTICS=MEAN MODE
/HISTOGRAM NORMAL /FORMAT=DVALUE /ORDER=ANALYSIS.
Crosstabulation
Go to Analyze > Descriptive Statistics > Crosstabs.
Select the variables for which you want the crosstabs. Ex) “predominant
climate” in column and “predominant religion” in row.
Click on the Statistics, check options such as “Chi-square,” and click Continue.
Click on the Cells button, check options like “expected” and “percentage >
total,” and click Continue.
Then click OK. The results will be displayed in the Output Editor.
Syntax programming
CROSSTABS /TABLES=religion BY climate /FORMAT=AVALUE TABLES
/STATISTICS=CHISQ /CELLS=COUNT EXPECTED TOTAL /COUNT ROUND CELL.
Correlation
Go to Analyze > Correlate.
Select Bivariate, if you are interested only in the relationships between two
variables, or Partial, if you are measuring the association between two variables
but want to factor out the effect of other variables.
Select the variables for which you want the correlation. Ex) “Gross domestic
product” and “people who read”
Then click OK. The results will be displayed in the Output Editor.
Syntax programming:
CORRELATIONS /VARIABLES=gdp_cap literacy /PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE
4
Transforming variables
Recoding Data
The Recode option allows you to change the coding of variables and create
discrete categories from continuous variables.
Go to: Transform > Recode into different variables.
Select “GDP per capita” for the “Numeric variable” box. Click “Old and New
Values.” Select “range: LOWEST through” box. Enter “10000”. In the new values,
enter 1. Click add. Select “range: value through HIGHEST” box. Enter “10001”. In
the new values, enter 0. Click add. Click Continue. In “output variable,” give the
name as “lowincome” and the label as “country has per capita income less
than 10,000.” Click change.
Click OK.
Syntax programming
RECODE gdp_cap (Lowest thru 10000=1) (10001 thru Highest=0) INTO
lowincome. VARIABLE LABELS lowincome 'country has per capita
income less than 10,000'. EXECUTE.
Transforming Data
The Compute option allows you to arithmetically combine or alter variables and
place the resulting value under a new variable name.
Go to: Transform > Compute variable.
Give a new variable name, for example, “lnGDP” under Target Variable.
In the “function group” box, select “Arithmetic” by double clicking on it.
In the “Functions and Special Variables” box, select “Ln” by double clicking on
it.
Highlight “Gross domestic product” and put it in the expression box by clicking
the arrow.
Click OK
Syntax programming
COMPUTE lnGDP=LN(gdp_cap). EXECUTE.
5
Splitting files
The Split File option allows you to compare subgroups or organize output by
subgroups. Let’s organize output by subgroups.
Data > Split file > Compare groups.
Select the button by “Organize output by groups”.
Highlight a variable you want, for example, “country has per capita income less
than 10000” or any categorical variable. Click OK.
Then do some analysis (for example, run a correlation between literacy and aids
rate). The output will show you the results of each group side by side.
Syntax programming
COMPUTE lnGDP=LN(gdp_cap). EXECUTE. SPLIT FILE OFF. SORT CASES
BY lowincome. SPLIT FILE SEPARATE BY lowincome.
COMPUTE lnGDP=LN(gdp_cap). EXECUTE. SPLIT FILE OFF. SORT CASES
BY lowincome. SPLIT FILE SEPARATE BY lowincome. CORRELATIONS
/VARIABLES=literacy aids /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.
To turn this off, go to Data > Split file > Compare groups and click “Analyze all
cases, do not create groups.
Then select OK.
Syntax programming
SPLIT FILE OFF. USE ALL.
Selecting cases
The Select cases option allows you to analyze only one of the subgroups of your
interest.
Go to Data > Select cases.
Check “if conditions is satisfied”. Click on the if button.
Highlight a variable you want, for example, “country has per capita income less
than 10000” or any categorical variable, and put it in the expression box by
clicking the arrow. Then specify a subgroup by writing an expression you want.
Ex) lowinc=1
6
Then do some analysis (for example, run a correlation between literacy and aids
rate. The output will show only the results of the selected group.
To turn off the selection:
Go to Data > Select cases.
Check “All cases”. Click OK.
Syntax programming:
COMPUTE filter_$=(lowincome=1). VARIABLE LABEL filter_$
'lowincome=1 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1
'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE.
CORRELATIONS /VARIABLES=literacy aids /PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
FILTER OFF. USE ALL. EXECUTE.
Hypothesis testing with T-Tests
There are three types of t-tests: One-Sample T test to compare a single sample
with a population value, Independent-Samples T test to compare two groups’
scores on the same variable, and Paired-Sample T test to compare the means of
two variables within a single group.
To compare the means of two variables within a single group,
Go to Analyze > Compare means > Paired-sample t-test.
Pick “average female life expectancy” and “average male life expectancy” for
variable 1 and 2.
Put the variables in the box by clicking the arrow. Click OK.
Syntax programming
T-TEST PAIRS=lifeexpf WITH lifeexpm (PAIRED) /CRITERIA=CI(.9500)
/MISSING=ANALYSIS.
Regression Analysis
To run a linear regression go to
Analyze > Regression > Linear.
7
Select your dependent variable (females who read) and your independent
variables (average female life expectancy, infant mortality, birth rate, Gross
domestic product) from the list on the left.
Then click OK.
Syntax programming
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT lit_fema
/METHOD=ENTER lifeexpf babymort birth_rt aids_rt gdp_cap.
Match merging data files
You can merge data from two files in two different ways. You can:
Merge the active dataset with another open dataset or SPSS Statistics
data file containing the same variables but different cases.
Merge the active dataset with another open dataset or SPSS Statistics
data file containing the same cases but different variables.
To Merge Files go to Data>Merge Files > Select Add Cases or Add Variables
Add Cases: merges the active dataset with a second dataset or external SPSS
Statistics data file that contains the same variables (columns) but different cases
(rows).
Add Variables merges the active dataset with another open dataset or external
SPSS Statistics data file that contains the same cases (rows) but different
variables (columns). For example, you might want to merge a data file that
contains pre-test results with one that contains post-test results.
Cases must be sorted in the same order in both datasets.
To Sort Cases go to Data> Sort Cases>
Select one or more sorting variables.
The data file is sorted based on the values of the sorting variables. If you select
multiple sorting variables, cases are sorted by values of each variable within
categories of the prior variable on the Sort list.
If one or more key variables are used to match cases, the two datasets must be
sorted by ascending order of the key variable(s).
8
Variable names in the second data file that duplicate variable names in the
active dataset are excluded by default because Add Variables assumes that
these variables contain duplicate information.
Syntax Programming
Using command syntax entails first sorting and saving files to be merged and
then matching. Below are the commands for merging:
GET FILE="dataset_A1.sav".
SORT CASES BY varname.
SAVE OUTFILE="dataset_A2.sav".
GET FILE="dataset_B1.sav".
SORT CASES BY varname.
SAVE OUTFILE="dataset_B2.sav".
MATCH FILES FILE=" dataset_A2.sav "
/FILE=" dataset_B2.sav "
/BY varname.
LIST.
Using command syntax, you can merge up to 50 datasets and/or data files.
9
Download