Data Handling Procedures and SPSS

advertisement
SPSS and Data
Handling Practices
Adv. Experimental
Methods & Statistics
PSYC 4310 / COGS 6310
Michael J. Kalsher
Department of
Cognitive Science
© 2013, Michael Kalsher
1
SPSS: A Refresher
• Entering / Opening Data Files
–
–
–
–
Data Editor
Variable View
Data Handling
Syntax Editor
• Graphing Data
• Assumptions of Parametric tests
• A useful link:
http://www.uwstout.edu/parq/upload/sps
sinfo.pdf
© 2013, Michael Kalsher
Getting Started:
The SPSS start-up window
For entering a
new data set
Enter directly in SPSS and
save as .SAV file.
Open SPSS
data set
Open Excel (other
type of data files)
© 2013, Michael Kalsher
The Data Editor:
© 2013, Michael Kalsher
Where the action is
Documenting Your Data:
The Importance of Maintaining a Code Book
– Maintain a code book, especially with larger datasets
and ones that you may set aside for relatively long
periods of time.
– Code Book information is in SPSS’s Variable View
– SPSS Files: Food for thought
• Reserve the first column of an SPSS file for “Subject
number / identifier (Conventionally, 1 to 8 characters long)
• Coded variables. Be sure to create variable
descriptors/labels to keep track of the values assigned
– 1 = male, 2 = female; or
– 0 = No Drug; 1 = 5mg; 2 = 10mg, etc.)
• Missing data (later in this lecture)
© 2013, Michael Kalsher
© 2013, Michael Kalsher
The Variable View
© 2013, Michael Kalsher
Variable Type: String
?
© 2013, Michael Kalsher
8
Default
© 2013, Michael Kalsher
Variable Type: Dates / DOB
Variable Labeling: Coded Variables
© 2013, Michael Kalsher
10
Missing Values
• Methods:
– (1) Discrete Values;
– (2) Range of values;
– (3) Range of values + one discrete value.
• Examples
– If the reason isn’t important, use a value unlikely to occur naturally (e.g., “999”)
– “DNA” (Does Not Apply)
• Subject never asked this question because of branch in questionnaire;
measurement only made for persons over 18 and this subject is under 18.
– “MISS” (Missed; Subject did not fill in value)
– “REF” (Refused; Subject would not cooperate after prompt)
© 2013, Michael Kalsher
11
Sample Setup: Final Variable View
© 2013, Michael Kalsher
Data File Comments:
Will you remember study details 2 years from now?
© 2013, Michael Kalsher
13
Sample Setup: Final Data View
© 2013, Michael Kalsher
14
Switching Views: Coded Value or Label
© 2013, Michael Kalsher
15
Data Handling Rules
– NEVER modify the original inputted data.
– Make modifications to a working file and
keep a syntax file that preserves all steps.
• Use menus and then the PASTE command, or
• Type commands directly into Syntax file and
save, then execute the Syntax file commands
– Perform all analyses on working files
• Preserve all analyses in a syntax file
– to document what you’ve done
– to reproduce output
© 2013, Michael Kalsher
Modifying Variables
“RECODE”
– Changes values of variables according to substitution rules
– Example: Participants’ ages could be recoded into discrete
categories, such as “Older” (DOBs before 1980) or “Younger” (DOBs
after 1980).
“COMPUTE”
– Creates new variables
– Example: A researcher notices that individual items on her survey seem to be
measuring the same construct (e.g., happiness). She creates a new composite
measure by summing participant’s scores on the individual items
“IF”
– Creates or modifies variables with logical rules
© 2013, Michael Kalsher
Outputting Data
• Use SPSS “Save As” command and select data file type
© 2013, Michael Kalsher
SPSS Syntax
Can be typed in manually, or created by using the
menu options and the “PASTE” command at each
step:
© 2013, Michael Kalsher
Syntax Example:
Does number of friends differ between students and lecturers?
© 2013, Michael Kalsher
20
Step 1
DV
IV
Step 2
Step 3
© 2013, Michael Kalsher
21
Syntax Editor
Analysis output
Note: the syntax precedes the
output tables.
© 2013, Michael Kalsher
22
Graphing Results:
© 2013, Michael Kalsher
Syntax Approach
23
Graphing Results:
© 2013, Michael Kalsher
Chart Builder
24
SPSS Chart Builder
© 2013, Michael Kalsher
© 2013, Michael Kalsher
26
Double-click on the graph
to edit its features
© 2013, Michael Kalsher
27
© 2013, Michael Kalsher
28
© 2013, Michael Kalsher
Exploring Parametric Test
Assumptions
• Normally distributed data
• Homogeneity of variance
• Independence
• Score-level data
© 2013, Michael Kalsher
30
Assessing Normality
• Visual inspection vs.
Statistical criteria
• Sample data or
Sampling distribution
The Central Limit Theorem: As sample size
increases, we can be more confident that the sampling
distribution is normally distributed
© 2013, Michael Kalsher
31
Assessing Normality:
© 2013, Michael Kalsher
Important Concepts
32
Assessing Normality:
Positive Skew
© 2013, Michael Kalsher
Skew and Kurtosis
Negative Skew
33
Assessing Normality:
The Download Music Festival
DownloadFestival.sav
A biologist concerned about
the potential health effects of
music festivals attends the
Download Music Festival and
measures the hygiene of 810
concert-goers over the 3-day
event.
She uses a Likert-type scale
that ranges from 0=smells
very bad to 4=smells great.
She predicts that hygiene will
decrease over time.
© 2013, Michael Kalsher
34
Assessing Normality Visually: P-P Plots
DownloadFestival.sav
Plots the cumulative
probability of a variable
against the cumulative
probability of a normal
distribution.
© 2013, Michael Kalsher
35
Distribution of data on day 1
is symmetrical, but is much
less symmetrical on days 2
and 3.
© 2013, Michael Kalsher
36
Assessing Normality Visually and Statistically
DownloadFestival.sav
© 2013, Michael Kalsher
37
Statistical Approach
© 2013, Michael Kalsher
Visual Approach
38
Visual Approach:
Distribution of data on
day 1 is symmetrical.
© 2013, Michael Kalsher
Frequency Distributions
Distribution of data is much less
symmetrical on days 2 and 3.
39
Statistical Approach: Measures of Central
Tendency, Dispersion, and Shape
© 2013, Michael Kalsher
40
Quantifying Normality with Numbers
In a normal distribution, skewness and kurtosis values
should be zero …. the further these values are away from
zero, the less likely the data are normally distributed.
• Skew
- Positive values indicate too many low scores.
- Negative values indicate too many high scores.
• Kurtosis
- Positive value indicate a pointy and heavy-tailed distribution
- Negative values indicate a flat and light-tailed distribution
Transforming skewness and kurtosis to z-scores
Zskewness = Skewness - 0
SESkewness
© 2013, Michael Kalsher
Zkurtosis = Kurtosis - 0
SEKurtosis
41
Quantifying Normality with Numbers
Skewness
Kurtosis
Day 1
z-scores
Day 2
z-scores
Day 3
z-scores
(n=810)
(n=264)
(n=123)
.047
-2.38
7.3
2.75
4.7
1.69
Compare these values to known values for the normal distribution:
Absolute value > 1.96 is significant at p<.05
Absolute value > 2.58 is significant at p<.01
Absolute value > 3.29 is significant at p<.001
Note:
Keep in mind that large samples typically have a small standard error
(the denominator), so interpret these values in light of the study’s
sample size.
© 2013, Michael Kalsher
42
Split File Function
Remember to turn it off when you’re done!
© 2013, Michael Kalsher
43
Statistical Tests of Normality:
Kolmogorov-Smirnov and Shapiro-Wilk tests
(SPSSExam.sav)
• Compares the scores in the sample to a normally
distributed set of scores with the same mean and
standard deviation.
• A non-significant result means the distribution of the
sample data is probably normal.
• K-S and Shapiro-Wilks similar, but Shapiro-Wilks
more likely to detect differences from normality if they
exist.
• Exercise care when applied to large sample sizes.
© 2013, Michael Kalsher
44
Testing Normality Statistically: SPSSExam.sav
• The study measured students’ performance
on an SPSS proficiency exam.
• Four DVs:
–
–
–
–
exam (1st year SPSS exam scores)
computer (computer literacy)
lecture (% of lectures attended)
numeracy (numerical ability; high score =15)
• One Grouping/IV variable (uni):
– Sussex university
– Duncetown university
© 2013, Michael Kalsher
45
© 2013, Michael Kalsher
46
Applied to
the Whole
Sample
© 2013, Michael Kalsher
47
Output: Entire Sample
The percentage on the SPSS exam, D(100) = 0.10, p<.05, and the numeracy scores, D(100) = 0.15, p<.001 were both
significantly non-normal.
© 2013, Michael Kalsher
48
Applied to
Separate
Groups
Similar to Split File
function.
© 2013, Michael Kalsher
49
Output: Separate Groups - % on SPSS Exam
© 2013, Michael Kalsher
50
Output: Separate Groups - % on Numeracy
© 2013, Michael Kalsher
51
Homogeneity of Variance
What is it?
In between-subjects designs, the variance of an outcome
variable or variables should be the same in each of the groups.
In correlational designs, the variance of one variable should be
stable at all levels of the other variable.
Why should I worry about it?
The presence of unequal variances can effect the accuracy of
the test statistic.
© 2013, Michael Kalsher
52
Homogeneity of Variance:
A Visual Depiction
Homogenous
Variance
Heterogeneous
Variance
Figure 5.11. Number of hours each person had ringing in their ears after each of several
(loud) concerts.
© 2013, Michael Kalsher
53
Homogeneity of Variance:
Levene’s test
- Performs a one-way ANOVA on the deviation scores. If
p<.05, then variances are significantly different and the
assumption is violated.
-
Caution: with large samples sizes, small differences in variances
can produce a significant Levene’s test.
© 2013, Michael Kalsher
54
Performing
Levene’s test
using SPSS
Splits the DVs
by university
Performs analysis
on the raw scores
© 2013, Michael Kalsher
55
Homogeneity of Variance:
Hartley’s Fmax or Variance Ratio
- Ratio of the variances between the group with the biggest
variance and the group with the smallest variance.
- Critical values depend on (1) number of cases per group
and (2) number of variances being compared.
- Variance ratios can be found in the SPSS output.
Variable
University
Variance
Ratio
Critical Value
SPSS Exam
%
Duncetown
158.477
1.52
1.67
Sussex
104.142
Duncetown
4.271
2.21
1.67
Sussex
9.432
Numeracy
56
© 2013, Michael Kalsher
© 2013, Michael Kalsher
57
SPSS: Procedure for Obtaining Variance Estimates
Dependent
Variables
Grouping
Variable
Choose
“Statistics”
© 2013, Michael Kalsher
58
Variance Ratio:
158.477
104.142
© 2013, Michael Kalsher
= 1.52
59
Variance Ratio:
9.432
= 2.21
4.271
© 2013, Michael Kalsher
60
Correcting Problems in the Data:
Outliers
• Delete the data from the person who
contributed the outlier
• Transform the data
• Change the score
– Next highest score plus one
– Convert back from a z-score
• Calculate the mean and standard deviation, then add either 2 or 3
times the s.d. to the mean and replace outliers with that score.
– The mean plus two standard deviations
© 2013, Michael Kalsher
61
Correcting Problems in the Data:
Dealing with Non-normality & Unequal Variances
Transform the data (via trial-and-error)
Corrects for:
•
•
•
•
Log transformation
Square root transformation
Reciprocal transformation
Reverse score transformation
(positive skew & unequal variances)
(positive skew & unequal variances)
(positive skew & unequal variances)
(negative skew)
Use robust statistics or non-parametric
alternatives
(see Field textbook, Ch. 5, pp. 153-164)
© 2013, Michael Kalsher
62
Download