SPSS: Beyond the Basics A “next steps” class on SPSS 16 for PCs Consultant: Betty Zou What we will cover: PART 1 Open Excel files in SPSS Merge SPSS files Multiple Response Questions PART 2 The Wonders of Statistics Coach Bivariate Correlation T-Tests One-Sample Independent Paired PART 1 Open an Excel file in SPSS In your Excel document Set-up should be like in SPSS Variable names in the first row Each row is a case In SPSS File > open data Specify “Files of type” Open: “Beyond_ExampleData_1.xls” (notice all the file types) In pop-up window--specify worksheet Merge SPSS files Some sources will give you data sets in separate files For example: New Immigrant Survey: http://nis.princeton.edu/ Same respondents, different questions Income, Assets, Employment all in separate files Merge Sort data in all files in ascending order Keep one file as working file and merge into that file Data > Merge Files > Add Variables Match cases on key variables: “ID” Merge: “Sample_Data_2” to “Sample_Data_Main” Multiple response questions Respondents might give more than one answer to a question. What is your favorite beer? a. PBR b. Corona e. Budweiser f. Other c. Fat Tire d. Mannies g. Don’t drink beer What is your favorite genre of music? a. Hip-hop b. Rock c. Punk d. Folk e. Country f. Other (please specify) _________ Multiple Response Questions (cont’d) 2 Options when entering data 1. Multiple category method (Beer 1, Beer 2, Beer 3) 2. Multiple dichotomy method Create a variable for each answer choice (Hip-hop, Rock, Punk…) Enter “1” if the answer choice was chosen, “0” for not Other (fill-in answer) Enter data & automatic recode OR Code the data yourself Define Multiple Response Sets Which beer is more popular? PBR or Bud? Analyze > Multiple Response > Define Variable Sets Set Variables Beer 1 – Beer 3 Set Variables are coded as “Categories” Range: 1 through 7 (the value labels) Analyze > Multiple Response > Frequencies/ Crosstabs More on Multiple Response Are men or women more likely to listen to hip-hop? Same process as before Set Variables are coded as “Dichotomies” Counted Value: 1 Go to crosstabs It may ask you to define range for “gender” Enter 1 for Min, 2 for Max PART 2 Statistics Coach Help > Statistics Coach What do you want to do? Ex: Compare groups for significant differences Data in categories Show Crosstabs Case Studies Bivariate Correlation Is there a correlation between age and income? Using dataset: “Sample_Data_Main” 2 scale variables: “age” and “income” Analyze > Correlate > Bivariate Check Pearson, Flag significant correlations Interpret: Pearson (0.665), Sig (0.00) There is a positive relationship between age and income. The higher the age, the higher the wage. This correlation is highly significant. Scatter Plot T-Test: One Sample You know that the average number of hours people work per week is 40. You want to know if the average number of hours worked in your sample is different from the known value. Analyze > Compare Means > One-Sample T Test Enter “WorkHrs” ; Test value: 40 Interpret: The sample mean is 44.6750 Sig. 0.014 significant at the 5% significance level t = 2.564 The sample mean could be 44.675 +2.564 or -2.564 These people are generally over worked The difference is significant T-Test: Independent Is the average annual income of females different from males? You want to compare the mean of two unrelated groups. Analyze > Independent-Samples T Test Test variables: Income Grouping variables: Gender Define Groups: 1, 2 Interpret: Sig: 0.301 The difference is not significant. T-Test: Paired A soda with ginko biloba in it claims to improve test scores. Your sample was given a general knowledge test before and after drinking the soda. (Test score out of 100) You want to test if drinking the soda significantly improves test scores. Use Paired because you are comparing two measurements for the same individuals. T-Test: Paired (cont’d) Analyze> Paired-Samples T Test Variable 1: PreTest Variable 2: PostTest Notice that you can compare more pairs Interpret: Drinking the soda actually reduces test scores by an average of about 40 points! Sig: 0.000, it is highly significant