SPSS Basics - Part 2 Running Statistical Procedures Managing and Exporting Output SPSS Command Syntax Tuesday, April 22, 2009 Thursday, April 24, 2009 329 Carman Hall This draft document last updated April 17, 2009 This document and related materials are available at http://www.lehman.edu/faculty/john/spss/ Presenter: John Dono ITR john@lehman.cuny.edu (718) 960-8338 SPSS workshop series and format see http://www.lehman.edu/docs/workshops/workshops.html 1. Overview of SPSS 1.1. Comprehensive collection of tools for data analysis, reporting, and data management 1.2. Available versions and compatibility issues 1.3. Licensing at Lehman 1.4. Locations where SPSS is installed 1 2. Opening an existing SPSS dataset (and review of Part 1) 2.1. Sample datasets and related files may be found in Samples folder on desktop. General Social Survey 2008 subset* General Social Survey 2008 complete Complete codebook: Frequencies for subset: Variable list: minigss8.sav gss2008.sav GSSCodeBook.pdf gssfreqs.pdf gssfreqs.htm gssfreqs.doc gssfreqs.spv see page 10 Height-Weight** Codebook: Frequencies; htwt.sav see page 9 htwt.spv htwt.doc Health*** Codebook: Frequencies: health.sav see page 9 health.spv health.doc *If you plan to use the GSS for serious work outside of this workshop, please visit the NORC website at http://www.norc.org and refer to the codebook, GSSCodeBook.pdf for usage guidelines, sampling techniques, question wording, coding schemes etc. **Hypothetical data from Cody, Ronald P. and Smith, Jeffrey K. Applied Statistics and the SAS Programming Language. (p.15) ***Hypothetical data from Kleinbaum, David G. and Kupper, Lawrence L. Applied Regression Analysis and Other Multivariable Methods. (p. 60) Other sources of high-quality data in SPSS format include ICPSR at University of Michigan of which CUNY is a member. Visit http://www.icpsr.umich.edu or contact William Bosworth (william.bosworth@lehman.cuny.edu, ext. 8465) for further information. You can also find sample files, most of which are hypothetical and intended for instructional purposes, in the samples folder in the SPSS installation directory. Descriptions may be obtained by searching for the phrase “sample files” in SPSS Help. 2 2.2. Starting SPSS 2.3. The Data Editor window 2.4. Opening an existing SPSS-format data file (known as a “system file” in the old days) – htwt.sav File > Open > Data 2.5. .sav file extension for SPSS-format data files 2.6. Structure of an SPSS data file – “spreadsheet-like” rectangular array or matrix with Rows as cases (units of analysis, observations e.g. respondent to a survey, a company, participant in an experiment) Columns as variables (measurements, responses, treatments on the units) Values for particular cases on particular variables in cells at row-column intersection 2.7. compare Data View and Variable View 3. Frequencies Procedures 3.1. Select statistical procedures appropriate to the type of variables you are working with and verify that your data meet the assumptions of the procedures (e.g. normal distributions, equality of variance). Refer to reputable statistical texts and consultants if necessary. 3.2. Use Frequencies to describing distribution of discrete variables (limited number of values or categories, nominal or ordinal “level of measurement”) 3.3. In Part 1 we used Frequencies for data validation purposes – identifying outliers and illegal codes. 3.4. Setting some global options to make output more informative 3.4.1. Select Edit > Options 3.4.2. On the General sheet, select Display Names and Alphabetical under Variable Lists and Open only one data set at a time under Windows. Click on Apply. 3 3.4.3. On the Output Labels sheet, select Names and Labels under Variables in item labels and select Values and Labels under Variable values in labels. Click on Apply. 3.4.4. On the File Locations sheet, change Specified Folders for data and other files to point to the samples folder on your desktop. Click OK. 3.5. Run Frequencies on variables in height-weight dataset (htwt.sav) 3.5.1. Select Analyze > Descriptive Statistics > Frequencies 3.6. the Frequencies dialog box 3.6.1. Selecting variables 3.6.2. Selecting appropriate statistics 3.7. SPSS output window and the SPSS viewer 3.8. Reviewing results and navigating the SPSS viewer 3.9. Retention of dialog box settings 3.10. Saving output in native SPSS output format 3.11. .spv file extension for output 3.12. Closing output window 3.13. Optional exercise: run Frequencies on some suitable variables from minigss8.sav (see page 10 for list of variables categorized) and save output 4. Descriptives procedure 4.1. Use Descriptives for describing distribution of continuous variables (many ordered categories, interval or ratio level of measurement) 4.2. Open health.sav and click on Variable View to display variable information 4.3. Run Descriptives on variables in dataset 4.4. the Descriptives dialog boxes 4.4.1. variables selection 4 4.4.2. the Options subdialog box to select statistics 4.5. Optional exercise: run Descriptives on age, educ and rincom06 from minigss8.sav but check frequencies on rincom06 first! 4.6. Save and close Output window 5. Crosstabs procedure 5.1. Use Crosstabs to examine associations among categorical variables (variable with a limited number of possible values ordered or not) 5.2. Open minigss8.sav (then close previously used datasets if still opened) and click on Variable View to display variable information 5.3. Run Crosstabs to examine the association between happiness and highest degree earned (happy * degree or happy by degree) 5.4. the Crosstabs dialog box 5.4.1. row/column variable selection procedures 5.4.2. cells subdialog box to specify contents of cells 5.4.2.1. decision regarding direction of percentaging 5.4.2.2. statistics subdialog box 5.5. Run Crosstabs to produce the following tables happy * agegroup happy * sex happy * marital happy * health happy * class 5.6. Introducing additional variables into the analysis to explain or specify the bivariate relationship in a two-way table 5.7. Run Crosstabs to produce the following table: 5 happy * marital * sex 5.8. Optional exercise: pres04 by degree by sex 6. Correlation 6.1. Use Correlate to examine linear association among continuous variables (ordinal with many categories, interval, ratio level of measurement) 6.2. Correlation may be positive or negative 6.3. Run Correlate to obtain correlation of height and weight in htwt.sav dataset 6.4. Optional exercise: obtain scattergram to visualize the linear relationship 6.5. Optional exercise: generate a correlation matrix from minigss8.sav on the following variables: paeduc, maeduc, educ, rincom06 6.6. Partial correlation procedures as analog of a three-way crosstabulation 7. Some other procedures 7.1. Comparison of means analysis with t-tests and Anova 7.2. Linear Regression 8. Managing output 8.1. Working in the SPSS Viewer 8.2. Navigating in the Viewer 8.3. Editing in the Viewer (Save first!) 6 8.4. Using Save As to save a modified version of output in SPSS format 8.5. Compatibility issues and obtaining the legacy viewer 8.6. Export output into alternative formats for further editing, presentation, distribution, publication etc. 8.6.1. Acrobat format (.pdf extension) 8.6.2. Web page format (.htm) 8.6.3. Microsoft word format (.doc,.rtf) 9. SPSS Command Syntax 9.1. SPSS Viewer log 9.2. Generating command syntax from dialog boxes using Paste 9.3. The Syntax window 9.4. Using the syntax windows to 9.4.1. use options not available through dialog boxes 9.4.2. save to rerun in current or later session 9.4.3. edit then rerun 9.4.4. document procedures 9.4.5. simplify procedures when dialog boxes are too cumbersome 9.5. Starting in the syntax window 7 10.Learning more about SPSS 10.1. http://www.spss.com 10.2. Manuals in pdf format provided with license 10.3. Help > Tutorial etc. 10.4. Visit academic web sites, e.g. http://www.usc.edu/its/stats/spss/index.html http://www.usc.edu/its/stats/spss/index.html 8 Codebook for HTWT dataset description variable name Identification Number ID Gender GENDER Height in inches HEIGHT Weight in lbs Weight Codebook for HEALTH dataset description variable name Identification number ID Systolic blood pressure 9999=missing SBP Quetelet index 9999=missing QUET Age in years AGE 98 = 98 or more 9999= missing . Smoking History SMK 0 = nonsmoker 1 = current or previous smoker 9=missing *Quetelet Index (a measure of size) = 100 * (weight/height**2) 9 Selected variables from 2008 General Social Survey Demographic variables age agegroup sex marital Economic variables class rincom06 class union Happiness hapmar happy Education educ degree Family background paeduc padeg maeduc madeg Political variables vote04 pres04 partyid polviews 10