Scales of Measurement

advertisement
1.
2.
3.
Observational
Physiological and Neuroscientific
Self-report
--majority of social & behavioral science research

Self-report measures
 People’s replies to written questionnaires or
interviews
 Can measure:
▪ thoughts (cognitive self-reports)
▪ feelings (affective self-reports)
▪ actions (behavioral self-reports)
Self-reported momentary emotions:
Positive and Negative Affect Schedule
(PANAS)
Indicate the extent you feel this way right now: enthusiastic
(Watson, Clark & Tellegen,1988)
Very enthusiastic
Not at all
enthusiastic
1
2
3
4
5
Indicate the extent you feel this way right now: upset
Not at all upset
1
2
3
4
5
Very upset
Thing being
measured
Interval
Nominal
Ordinal
1st Place Sample
Hot = 1
2nd Place Sample
Warm = 3
3rd Place Sample
4th Place Sample
Cold = 2
5th Place Sample
Interval
Ratio
Distinction between scales is due to the
meaning of numbers
1.
Nominal Scale—numbers assigned are only labels.
2.
Ordinal Scale—a rank ordering.
3.
Interval Scale—each number equidistant from the
next, but no zero point (majority of measures).
4.
Ratio Scale—each number is equidistant and there is a
true zero point.
Type of Scale Determines Statistics and Power
Nominal
Ordinal
Interval
Ratio
Statistics
Power
Chi-square
Rank-order tests
Parametric tests
(F-tests, t-tests)
Parametric tests and math
operations
Low
Moderate
High
High

Valid: measure assesses the construct it is
intended to and is not influenced by other
factors

Reliable: the consistency of a measure, does
it provide the same result repeatedly.
Reliable but not Valid
Dependable measure, but doesn’t measure what
it should
Example: Arm length to measure self-esteem.
Valid but not Reliable
Measures what it should,
but not dependably
Example: Stone as a measure of
weight in Great Britain.
Central dot = construct we are seeking to
measure

Test-Retest Reliability
Measure administered at two points in time to assess
consistency. Works best for things that do not change
over time (e.g., intelligence).

Internal Consistency Reliability
Judgments of consistency of results across items in the
same test administration session.
1. Intercorrelation: Chronbach’s α (> .65 is preferred)
2. Split halves reliability

Content Validity
Does the measure represent the range of possible items
the it should cover based on the meaning of the
measure.

Predictive Validity
measure predicts criterion measures that are assessed
at a later time.
Ex: Does aptitude assessment predict later success?

Construct Validity
Does the measure actually tap into intended
construct?

Guided spontaneous response from individuals
in sample population (thought listings, essay
questions…)

Face valid items: develop items that appear to
measure your construct.

Pilot test a larger set of items and choose those
that are more reliable & valid.

Reversed coded items indicate whether
participants are paying attention.

Likert Scale:
To what extent do you agree with the
following statement…
(0 to 9, strongly disagree-strongly agree)

Semantic Differential:
What is your response to (insert person,
object, place, issue)?
(-5 to +5, good-bad, like-dislike, warm-cold)

The measure exists already in the literature

Restriction of range: responses either at high
or low end of scale (skew).

Can you trust responses? Social desirability,
demand characteristics & satisficing.
1. Develop subjective and objective versions of a
new scale
 Example: Contact with Blacks scale:
Objective: % of your neighborhood growing up
Subjective: No Blacks—a lot of Blacks
2. Using 5+ items worded similarly provides greatly
increased reliability and likelihood of success.
3. Human targets are rarely evaluated below the
midpoint of the scale, so use more scale points
(9 instead of 5 points).
**Most Important** If you have a larger study
ready and a great idea for a new scale comes
up, build something and give it a shot!








Response time measures
Physiological measures
Neuroscience: fMRI and other brain imaging
Indirect measures: projective tests, etc.
Facial and other behavior coding schemes
(verbal/nonverbal)
Cognitive measures: (memory, perception…)
Task performance: academic, physical…
Game theory: prisoner’s dilemma…
Chronbach’s α:
AnalyzeScaleReliability Analysis
Pull over all scale items
Click Statistics, select inter-item correlations
OK
Try Van Camp, Barden & Sloan (2010) data file. Centrality1Centrality8. Compare to manuscript.
Many other reliability analyses involve correlations (test-retest,
split halves) or probabilities (inter-rater reliability).
Case Processing Summary
N
Cases
Valid
Excludeda
Total
Reliability Statistics
Cronbach's
%
109
86.5
17
13.5
126
100.0
Alpha Based on
Cronbach's
Standardized
Alpha
Items
.706
N of Items
.743
8
a. Listwise deletion based on all variables in the
procedure.
Inter-Item Correlation Matrix
centrality1rev centrality2 centrality3 centrality4rev centrality5 centrality6 centrality7 centrality8rev
centrality1rev
1.000
.244
.069
.297
.082
.170
.148
.208
centrality2
.244
1.000
.298
.323
.509
.411
.588
.031
centrality3
.069
.298
1.000
.206
.398
.337
.398
.042
centrality4rev
.297
.323
.206
1.000
.213
.160
.350
.284
centrality5
.082
.509
.398
.213
1.000
.589
.637
-.063
centrality6
.170
.411
.337
.160
.589
1.000
.475
.075
centrality7
.148
.588
.398
.350
.637
.475
1.000
-.041
centrality8rev
.208
.031
.042
.284
-.063
.075
-.041
1.000

Factor Analysis:
determines factor structure of measures
(does your measure assess one construct or
multiple constructs? Is your proposed
construct coherent?)

Multi-trait Multi-method Matrix:
using combination of existing measures and
manipulations to establish convergent/
divergent validity with measure.

Inter-rater Reliability
Independent judges score participant responses and
the % of agreement is assessed to indicate
reliability. Used particularly for measures
requiring coding (video coding, spontaneous
responses…).
Download