outline_part2

advertisement
SPSS Basics - Part 2
 Running Statistical Procedures
 Managing and Exporting Output
 SPSS Command Syntax
Tuesday, April 22, 2009
Thursday, April 24, 2009
329 Carman Hall
This draft document last updated April 17, 2009
This document and related materials are available at http://www.lehman.edu/faculty/john/spss/
Presenter:
John Dono
ITR
john@lehman.cuny.edu
(718) 960-8338
SPSS workshop series and format
see http://www.lehman.edu/docs/workshops/workshops.html
1. Overview of SPSS
1.1. Comprehensive collection of tools for data analysis, reporting, and data management
1.2. Available versions and compatibility issues
1.3. Licensing at Lehman
1.4. Locations where SPSS is installed
1
2. Opening an existing SPSS dataset (and review of Part 1)
2.1. Sample datasets and related files may be found in Samples folder on desktop.
General Social Survey 2008 subset*
General Social Survey 2008 complete
Complete codebook:
Frequencies for subset:
Variable list:
minigss8.sav
gss2008.sav
GSSCodeBook.pdf
gssfreqs.pdf
gssfreqs.htm
gssfreqs.doc
gssfreqs.spv
see page 10
Height-Weight**
Codebook:
Frequencies;
htwt.sav
see page 9
htwt.spv
htwt.doc
Health***
Codebook:
Frequencies:
health.sav
see page 9
health.spv
health.doc
*If you plan to use the GSS for serious work outside of this workshop, please visit the
NORC website at http://www.norc.org and refer to the codebook, GSSCodeBook.pdf for
usage guidelines, sampling techniques, question wording, coding schemes etc.
**Hypothetical data from Cody, Ronald P. and Smith, Jeffrey K. Applied Statistics and
the SAS Programming Language. (p.15)
***Hypothetical data from Kleinbaum, David G. and Kupper, Lawrence L. Applied
Regression Analysis and Other Multivariable Methods. (p. 60)
Other sources of high-quality data in SPSS format include ICPSR at University of
Michigan of which CUNY is a member. Visit http://www.icpsr.umich.edu or contact
William Bosworth (william.bosworth@lehman.cuny.edu, ext. 8465) for further
information.
You can also find sample files, most of which are hypothetical and intended for
instructional purposes, in the samples folder in the SPSS installation directory.
Descriptions may be obtained by searching for the phrase “sample files” in SPSS Help.
2
2.2. Starting SPSS
2.3. The Data Editor window
2.4. Opening an existing SPSS-format data file (known as a “system file” in the old days) –
htwt.sav
File > Open > Data
2.5. .sav file extension for SPSS-format data files
2.6. Structure of an SPSS data file – “spreadsheet-like” rectangular array or matrix with
Rows as cases (units of analysis, observations e.g. respondent to a survey, a company,
participant in an experiment)
Columns as variables (measurements, responses, treatments on the units)
Values for particular cases on particular variables in cells at row-column intersection
2.7. compare Data View and Variable View
3. Frequencies Procedures
3.1. Select statistical procedures appropriate to the type of variables you are working with
and verify that your data meet the assumptions of the procedures (e.g. normal
distributions, equality of variance). Refer to reputable statistical texts and consultants if
necessary.
3.2. Use Frequencies to describing distribution of discrete variables (limited number of
values or categories, nominal or ordinal “level of measurement”)
3.3. In Part 1 we used Frequencies for data validation purposes – identifying outliers and
illegal codes.
3.4. Setting some global options to make output more informative
3.4.1. Select Edit > Options
3.4.2. On the General sheet, select Display Names and Alphabetical under Variable
Lists and Open only one data set at a time under Windows. Click on Apply.
3
3.4.3. On the Output Labels sheet, select Names and Labels under Variables in item
labels and select Values and Labels under Variable values in labels. Click on
Apply.
3.4.4. On the File Locations sheet, change Specified Folders for data and other files to
point to the samples folder on your desktop. Click OK.
3.5. Run Frequencies on variables in height-weight dataset (htwt.sav)
3.5.1. Select Analyze > Descriptive Statistics > Frequencies
3.6. the Frequencies dialog box
3.6.1. Selecting variables
3.6.2. Selecting appropriate statistics
3.7. SPSS output window and the SPSS viewer
3.8. Reviewing results and navigating the SPSS viewer
3.9. Retention of dialog box settings
3.10.
Saving output in native SPSS output format
3.11.
.spv file extension for output
3.12.
Closing output window
3.13.
Optional exercise: run Frequencies on some suitable variables from minigss8.sav
(see page 10 for list of variables categorized) and save output
4. Descriptives procedure
4.1. Use Descriptives for describing distribution of continuous variables (many ordered
categories, interval or ratio level of measurement)
4.2. Open health.sav and click on Variable View to display variable information
4.3. Run Descriptives on variables in dataset
4.4. the Descriptives dialog boxes
4.4.1. variables selection
4
4.4.2. the Options subdialog box to select statistics
4.5. Optional exercise: run Descriptives on age, educ and rincom06 from minigss8.sav but
check frequencies on rincom06 first!
4.6. Save and close Output window
5. Crosstabs procedure
5.1. Use Crosstabs to examine associations among categorical variables (variable with a
limited number of possible values ordered or not)
5.2. Open minigss8.sav (then close previously used datasets if still opened) and click on
Variable View to display variable information
5.3. Run Crosstabs to examine the association between happiness and highest degree earned
(happy * degree or happy by degree)
5.4. the Crosstabs dialog box
5.4.1. row/column variable selection procedures
5.4.2. cells subdialog box to specify contents of cells
5.4.2.1.
decision regarding direction of percentaging
5.4.2.2.
statistics subdialog box
5.5. Run Crosstabs to produce the following tables
happy * agegroup
happy * sex
happy * marital
happy * health
happy * class
5.6. Introducing additional variables into the analysis to explain or specify the bivariate
relationship in a two-way table
5.7. Run Crosstabs to produce the following table:
5
happy * marital * sex
5.8. Optional exercise: pres04 by degree by sex
6. Correlation
6.1. Use Correlate to examine linear association among continuous variables (ordinal with
many categories, interval, ratio level of measurement)
6.2. Correlation may be positive or negative
6.3. Run Correlate to obtain correlation of height and weight in htwt.sav dataset
6.4. Optional exercise: obtain scattergram to visualize the linear relationship
6.5. Optional exercise: generate a correlation matrix from minigss8.sav on the following
variables:
paeduc, maeduc, educ, rincom06
6.6. Partial correlation procedures as analog of a three-way crosstabulation
7. Some other procedures
7.1. Comparison of means analysis with t-tests and Anova
7.2. Linear Regression
8. Managing output
8.1. Working in the SPSS Viewer
8.2. Navigating in the Viewer
8.3. Editing in the Viewer (Save first!)
6
8.4. Using Save As to save a modified version of output in SPSS format
8.5. Compatibility issues and obtaining the legacy viewer
8.6. Export output into alternative formats for further editing, presentation, distribution,
publication etc.
8.6.1. Acrobat format (.pdf extension)
8.6.2. Web page format (.htm)
8.6.3. Microsoft word format (.doc,.rtf)
9. SPSS Command Syntax
9.1. SPSS Viewer log
9.2. Generating command syntax from dialog boxes using Paste
9.3. The Syntax window
9.4. Using the syntax windows to
9.4.1. use options not available through dialog boxes
9.4.2. save to rerun in current or later session
9.4.3. edit then rerun
9.4.4. document procedures
9.4.5. simplify procedures when dialog boxes are too cumbersome
9.5. Starting in the syntax window
7
10.Learning more about SPSS
10.1.
http://www.spss.com
10.2.
Manuals in pdf format provided with license
10.3.
Help > Tutorial etc.
10.4.
Visit academic web sites, e.g.
http://www.usc.edu/its/stats/spss/index.html
http://www.usc.edu/its/stats/spss/index.html
8
Codebook for HTWT dataset
description
variable name
Identification Number
ID
Gender
GENDER
Height in inches
HEIGHT
Weight in lbs
Weight
Codebook for HEALTH dataset
description
variable name
Identification number
ID
Systolic blood pressure
9999=missing
SBP
Quetelet index
9999=missing
QUET
Age in years
AGE
98 = 98 or more
9999= missing
.
Smoking History
SMK
0 = nonsmoker
1 = current or previous smoker
9=missing
*Quetelet Index (a measure of size) = 100 * (weight/height**2)
9
Selected variables from 2008 General Social Survey
Demographic variables
age
agegroup
sex
marital
Economic variables
class
rincom06
class
union
Happiness
hapmar
happy
Education
educ
degree
Family background
paeduc
padeg
maeduc
madeg
Political variables
vote04
pres04
partyid
polviews
10
Download