SPSS and Data Handling Practices Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Michael J. Kalsher Department of Cognitive Science © 2013, Michael Kalsher 1 SPSS: A Refresher • Entering / Opening Data Files – – – – Data Editor Variable View Data Handling Syntax Editor • Graphing Data • Assumptions of Parametric tests • A useful link: http://www.uwstout.edu/parq/upload/sps sinfo.pdf © 2013, Michael Kalsher Getting Started: The SPSS start-up window For entering a new data set Enter directly in SPSS and save as .SAV file. Open SPSS data set Open Excel (other type of data files) © 2013, Michael Kalsher The Data Editor: © 2013, Michael Kalsher Where the action is Documenting Your Data: The Importance of Maintaining a Code Book – Maintain a code book, especially with larger datasets and ones that you may set aside for relatively long periods of time. – Code Book information is in SPSS’s Variable View – SPSS Files: Food for thought • Reserve the first column of an SPSS file for “Subject number / identifier (Conventionally, 1 to 8 characters long) • Coded variables. Be sure to create variable descriptors/labels to keep track of the values assigned – 1 = male, 2 = female; or – 0 = No Drug; 1 = 5mg; 2 = 10mg, etc.) • Missing data (later in this lecture) © 2013, Michael Kalsher © 2013, Michael Kalsher The Variable View © 2013, Michael Kalsher Variable Type: String ? © 2013, Michael Kalsher 8 Default © 2013, Michael Kalsher Variable Type: Dates / DOB Variable Labeling: Coded Variables © 2013, Michael Kalsher 10 Missing Values • Methods: – (1) Discrete Values; – (2) Range of values; – (3) Range of values + one discrete value. • Examples – If the reason isn’t important, use a value unlikely to occur naturally (e.g., “999”) – “DNA” (Does Not Apply) • Subject never asked this question because of branch in questionnaire; measurement only made for persons over 18 and this subject is under 18. – “MISS” (Missed; Subject did not fill in value) – “REF” (Refused; Subject would not cooperate after prompt) © 2013, Michael Kalsher 11 Sample Setup: Final Variable View © 2013, Michael Kalsher Data File Comments: Will you remember study details 2 years from now? © 2013, Michael Kalsher 13 Sample Setup: Final Data View © 2013, Michael Kalsher 14 Switching Views: Coded Value or Label © 2013, Michael Kalsher 15 Data Handling Rules – NEVER modify the original inputted data. – Make modifications to a working file and keep a syntax file that preserves all steps. • Use menus and then the PASTE command, or • Type commands directly into Syntax file and save, then execute the Syntax file commands – Perform all analyses on working files • Preserve all analyses in a syntax file – to document what you’ve done – to reproduce output © 2013, Michael Kalsher Modifying Variables “RECODE” – Changes values of variables according to substitution rules – Example: Participants’ ages could be recoded into discrete categories, such as “Older” (DOBs before 1980) or “Younger” (DOBs after 1980). “COMPUTE” – Creates new variables – Example: A researcher notices that individual items on her survey seem to be measuring the same construct (e.g., happiness). She creates a new composite measure by summing participant’s scores on the individual items “IF” – Creates or modifies variables with logical rules © 2013, Michael Kalsher Outputting Data • Use SPSS “Save As” command and select data file type © 2013, Michael Kalsher SPSS Syntax Can be typed in manually, or created by using the menu options and the “PASTE” command at each step: © 2013, Michael Kalsher Syntax Example: Does number of friends differ between students and lecturers? © 2013, Michael Kalsher 20 Step 1 DV IV Step 2 Step 3 © 2013, Michael Kalsher 21 Syntax Editor Analysis output Note: the syntax precedes the output tables. © 2013, Michael Kalsher 22 Graphing Results: © 2013, Michael Kalsher Syntax Approach 23 Graphing Results: © 2013, Michael Kalsher Chart Builder 24 SPSS Chart Builder © 2013, Michael Kalsher © 2013, Michael Kalsher 26 Double-click on the graph to edit its features © 2013, Michael Kalsher 27 © 2013, Michael Kalsher 28 © 2013, Michael Kalsher Exploring Parametric Test Assumptions • Normally distributed data • Homogeneity of variance • Independence • Score-level data © 2013, Michael Kalsher 30 Assessing Normality • Visual inspection vs. Statistical criteria • Sample data or Sampling distribution The Central Limit Theorem: As sample size increases, we can be more confident that the sampling distribution is normally distributed © 2013, Michael Kalsher 31 Assessing Normality: © 2013, Michael Kalsher Important Concepts 32 Assessing Normality: Positive Skew © 2013, Michael Kalsher Skew and Kurtosis Negative Skew 33 Assessing Normality: The Download Music Festival DownloadFestival.sav A biologist concerned about the potential health effects of music festivals attends the Download Music Festival and measures the hygiene of 810 concert-goers over the 3-day event. She uses a Likert-type scale that ranges from 0=smells very bad to 4=smells great. She predicts that hygiene will decrease over time. © 2013, Michael Kalsher 34 Assessing Normality Visually: P-P Plots DownloadFestival.sav Plots the cumulative probability of a variable against the cumulative probability of a normal distribution. © 2013, Michael Kalsher 35 Distribution of data on day 1 is symmetrical, but is much less symmetrical on days 2 and 3. © 2013, Michael Kalsher 36 Assessing Normality Visually and Statistically DownloadFestival.sav © 2013, Michael Kalsher 37 Statistical Approach © 2013, Michael Kalsher Visual Approach 38 Visual Approach: Distribution of data on day 1 is symmetrical. © 2013, Michael Kalsher Frequency Distributions Distribution of data is much less symmetrical on days 2 and 3. 39 Statistical Approach: Measures of Central Tendency, Dispersion, and Shape © 2013, Michael Kalsher 40 Quantifying Normality with Numbers In a normal distribution, skewness and kurtosis values should be zero …. the further these values are away from zero, the less likely the data are normally distributed. • Skew - Positive values indicate too many low scores. - Negative values indicate too many high scores. • Kurtosis - Positive value indicate a pointy and heavy-tailed distribution - Negative values indicate a flat and light-tailed distribution Transforming skewness and kurtosis to z-scores Zskewness = Skewness - 0 SESkewness © 2013, Michael Kalsher Zkurtosis = Kurtosis - 0 SEKurtosis 41 Quantifying Normality with Numbers Skewness Kurtosis Day 1 z-scores Day 2 z-scores Day 3 z-scores (n=810) (n=264) (n=123) .047 -2.38 7.3 2.75 4.7 1.69 Compare these values to known values for the normal distribution: Absolute value > 1.96 is significant at p<.05 Absolute value > 2.58 is significant at p<.01 Absolute value > 3.29 is significant at p<.001 Note: Keep in mind that large samples typically have a small standard error (the denominator), so interpret these values in light of the study’s sample size. © 2013, Michael Kalsher 42 Split File Function Remember to turn it off when you’re done! © 2013, Michael Kalsher 43 Statistical Tests of Normality: Kolmogorov-Smirnov and Shapiro-Wilk tests (SPSSExam.sav) • Compares the scores in the sample to a normally distributed set of scores with the same mean and standard deviation. • A non-significant result means the distribution of the sample data is probably normal. • K-S and Shapiro-Wilks similar, but Shapiro-Wilks more likely to detect differences from normality if they exist. • Exercise care when applied to large sample sizes. © 2013, Michael Kalsher 44 Testing Normality Statistically: SPSSExam.sav • The study measured students’ performance on an SPSS proficiency exam. • Four DVs: – – – – exam (1st year SPSS exam scores) computer (computer literacy) lecture (% of lectures attended) numeracy (numerical ability; high score =15) • One Grouping/IV variable (uni): – Sussex university – Duncetown university © 2013, Michael Kalsher 45 © 2013, Michael Kalsher 46 Applied to the Whole Sample © 2013, Michael Kalsher 47 Output: Entire Sample The percentage on the SPSS exam, D(100) = 0.10, p<.05, and the numeracy scores, D(100) = 0.15, p<.001 were both significantly non-normal. © 2013, Michael Kalsher 48 Applied to Separate Groups Similar to Split File function. © 2013, Michael Kalsher 49 Output: Separate Groups - % on SPSS Exam © 2013, Michael Kalsher 50 Output: Separate Groups - % on Numeracy © 2013, Michael Kalsher 51 Homogeneity of Variance What is it? In between-subjects designs, the variance of an outcome variable or variables should be the same in each of the groups. In correlational designs, the variance of one variable should be stable at all levels of the other variable. Why should I worry about it? The presence of unequal variances can effect the accuracy of the test statistic. © 2013, Michael Kalsher 52 Homogeneity of Variance: A Visual Depiction Homogenous Variance Heterogeneous Variance Figure 5.11. Number of hours each person had ringing in their ears after each of several (loud) concerts. © 2013, Michael Kalsher 53 Homogeneity of Variance: Levene’s test - Performs a one-way ANOVA on the deviation scores. If p<.05, then variances are significantly different and the assumption is violated. - Caution: with large samples sizes, small differences in variances can produce a significant Levene’s test. © 2013, Michael Kalsher 54 Performing Levene’s test using SPSS Splits the DVs by university Performs analysis on the raw scores © 2013, Michael Kalsher 55 Homogeneity of Variance: Hartley’s Fmax or Variance Ratio - Ratio of the variances between the group with the biggest variance and the group with the smallest variance. - Critical values depend on (1) number of cases per group and (2) number of variances being compared. - Variance ratios can be found in the SPSS output. Variable University Variance Ratio Critical Value SPSS Exam % Duncetown 158.477 1.52 1.67 Sussex 104.142 Duncetown 4.271 2.21 1.67 Sussex 9.432 Numeracy 56 © 2013, Michael Kalsher © 2013, Michael Kalsher 57 SPSS: Procedure for Obtaining Variance Estimates Dependent Variables Grouping Variable Choose “Statistics” © 2013, Michael Kalsher 58 Variance Ratio: 158.477 104.142 © 2013, Michael Kalsher = 1.52 59 Variance Ratio: 9.432 = 2.21 4.271 © 2013, Michael Kalsher 60 Correcting Problems in the Data: Outliers • Delete the data from the person who contributed the outlier • Transform the data • Change the score – Next highest score plus one – Convert back from a z-score • Calculate the mean and standard deviation, then add either 2 or 3 times the s.d. to the mean and replace outliers with that score. – The mean plus two standard deviations © 2013, Michael Kalsher 61 Correcting Problems in the Data: Dealing with Non-normality & Unequal Variances Transform the data (via trial-and-error) Corrects for: • • • • Log transformation Square root transformation Reciprocal transformation Reverse score transformation (positive skew & unequal variances) (positive skew & unequal variances) (positive skew & unequal variances) (negative skew) Use robust statistics or non-parametric alternatives (see Field textbook, Ch. 5, pp. 153-164) © 2013, Michael Kalsher 62