Uploaded by Luis Enrique Mendiola

5087063

advertisement
Validity and Reliability in
Instrumentation
47.469: Research I: Basics
Dr. Leonard
February 24, 2010
Recap
 Research design can be…




experimental or non-experimental (maybe quasi-experimental)
basic or applied research
laboratory or field setting
quantitative or qualitative data collection
 Research must be based in solid theory and testable
hypotheses
 Research must include clear conceptual and operational
definitions
Quasi-experimental
 Occurring more commonly in psychology
 Apply experimental principles like cause and effect or
group comparison to field, or less controlled settings
 More like correlational research
 Less control over extraneous variables but can take place
outside of lab, which may decrease the artificial feeling
 Interpretation of results not as clean as in experimental
research but closer to “real world” application
Scientific method
1. Formulate theories √
2. Develop testable hypotheses (operational definitions) √
3. Conduct research, gather data √
4. Evaluate hypotheses based on data
5. Cautiously draw conclusions
Next steps…gather data
 Once you have explicitly clear conceptual and
operational definitions to guide the research, you must
develop your measures for collecting data
 Operational definition proposes type of measures
 Instrumentation is the process of selecting or creating
measures for a study (the measure is your instrument)
 Two overarching goals for instrumentation
 Validity: the extent to which a measure (operationally
defined) taps the concept it’s designed to measure
and not some other concept
 Reliability: the consistency or stability of a measure,
i.e., same results obtained if measure used again
Caveats
 Can never be certain of the validity (or reliability) of our
instruments so we try to speculate the degree of validity
 We might claim “modest” or “partial” validity
 Hard to capture true essence of a concept/construct and
some concepts/constructs are more elusive than others!
 An estimate of the validity of our measures depends on the
purpose of the study
 Keep focused on the hypotheses and operational definitions!
 Two types of validity we estimate
 Judgmental validity
 Empirical validity
Types of validity: Judgmental
 Content validity: whether the concept being
measured is a real concept AND whether the
measurement being used is the most appropriate one
to be using
Concept
 Is our operationally defined variable (concrete)
really capturing the hypothetical concept (abstract)
we are interested in studying?
 Are we capturing the central meaning?
Variable/
Measure
Types of validity: Judgmental
 Content validity, or any other type of validity alone, is
never enough to determine if our measure is valid so
we consider other types…
 Face validity: measure is valid because it makes
sense; on the surface, it seems to tap into construct of
interest
 Face Validity is neither sufficient nor absolutely
necessary for overall validity, but is a helpful clue
 Could have high face validity but low content
validity!
Good face validity?
Rosenberg Self-Esteem Scale
1= Strongly Disagree, 7 = Strongly Agree
_____1. I feel that I am a person of worth, at least on an
equal basis with others.
_____2. I feel that I have a number of good qualities.
_____3. All in all, I am inclined to think that I am a failure.*
_____4. I am able to do things as well as most people.
_____5. I feel that I do not have much to be proud of.*
_____6. I take a positive attitude towards myself.
_____7. On the whole, I am satisfied with myself.
_____8. I wish I could have more respect for myself.*
_____9. I certainly feel useless at times.*
_____10. At times I think I am no good at all.*
*Reverse scored
Types of validity: Empirical
•
Criterion-related Validity: extent to which your measure of a
concept relates to a theoretically meaningful criterion for that
concept, a “gold standard” for that concept
•
Predictive validity: The measure should be able to predict future
behavior that is related to the concept
•
•
Concurrent (convergent) validity: The measure should be
meaningfully related or correlated to some other measure of the
behavior
•
•
E.g., Job skills test and future ratings of performance
E.g., Scores on two different job skills tests
Predicitve or concurent validity coefficient: a number (0-1)
based on correlation that quantifies whether the measure is in fact
related to other measures it should be related to
Predictive Validity
Correlation
coefficient = .60
Qualification
For job
Future
performance
ratings
Job skills
test
Job skills
test
Concurrent (convergent) Validity
Qualification
for job
Job skills
test B
Job skills
Test A
Job skills
Test B
Job skills
test A
Types of validity:
Judgmental-Empirical
 Construct validity represents a combined
approach for estimating validity using
 1) a subjective prediction about what other concepts
(indicators) the concept being measured should relate
to and..
 May relate positively OR negatively
 2) an empirical test of whether the concept is in fact
related to those other indicators
 E.g., Depression should be linked to disengagement
from schoolwork among college students so test
relationship between depression scores and GPA
among a sample of students
Construct Validity exercise
 Take heart rate for 30 seconds and multiply by 2,
record on separate paper
 Repeat
 Average two heart rate measurements
 Turn in paper
 Complete Manifest Anxiety Scale
 Score
 Turn in sheet
 Why is this as a test of construct validity?
Reliability
 The consistency or stability of a measure; easier to establish when
measure is unidimensional
 Related to validity? Yes!
 Generally, more valid measures tend to be more reliable BUT you
could have a highly reliable measure that is low in validity
 Think of gun shooting a target example
 Like validity, reliability can be estimated by a correlation coefficient
(0-1)
 Generally, to be respectable in the scientific community, reliability
should be .80 (80%) or higher
Relationship between reliability and
validity
Is our measure
RELIABLE?
Is our measure
VALID? Does it
Does it have
consistency and
stability in
measurement?
measure what it’s
supposed to
measure?
Validity is more important to a research study;
reliability can’t tell us if we are measuring the correct
concept, only if we are measuring something consistently.
Classical Test Theory
An observed measurement (or score, X) is comprised of
a true score (T, the score that would be obtained if there
was no measurement error) and some random
measurement error (E).
X=T+E
X is the observed score
T is the true score
E is the measurement error
Types of Reliability
 Test-retest reliability - consistent results from same
measure under same conditions two times
 Across time
 Inter-rater reliability - consistent results when same
measure is given twice, but with different test givers, or
have two independent observers code some behavior
 Across raters or observers
 Alpha reliability - individual items/questions from a scale
measuring same concept are correlated
 Across items
 Split-half reliability - items from one part of a scale are
correlated and measure same concept as another part
 Across items
Test-retest reliability (across time)
ID
Time 1
X
Time 2
X
1
2
3
4
18
12
29
25
19
13
28
25
Inter-rater reliability (across raters)
ID
Rater 1
X
Rater 2
X
Rater 3
X
1
2
3
4
18
12
29
25
19
13
28
25
20
14
27
24
Alpha reliability (across items)
ID
1
2
3
4
Item 1
X1
Item 2
X2
Item 3
X3
(Reversed)
4
3
2
1
4
2
1
1
5
4
3
2
Item 4
X4
Item 5
X5
(Reversed)
(Reversed)
4
3
1
1
4
3
2
1
Item 6
X6
Sometimes called internal consistency
5
4
4
2
Split-half reliability (across items)
ID
First 1/2
items
Second 1/2
items
1
2
3
4
5
6
7
8
9
10
1
2
3
3
2
3
1
2
2
1
1
2
2
3
1
2
1
2
2
2
Sometimes
called
internal
consistency
The more, the better
 As with validity, it is always better if you can
estimate or test for multiple forms of
reliability!
 Sometimes called parallel-forms reliability if
measure is available in more than one version
and can be given in both ways and then
compared
Valid? Reliable?
 Concept: Parental engagement in child’s
academic development
 How often do you help your child with his/her
homework (please check one)?
_Never
_Rarely
_Sometimes
_Often
_Everyday
Concept? Valid? Reliable?
Is there a chance that you could get HIV/AIDS?
1--------------------2--------------------3
Not at
Small chance
Yes
All
Definitely
Do you worry about getting HIV/AIDS? (circle one number)
1--------------------2--------------------3--------------------4--------------------5
Never
Almost
Sometimes
Often
Very
Never
Often
Concept? Valid? Reliable?
How important is financial success to you?
_Very important
_Somewhat important
_Not at all important
How important is it for you to have nice things?
_Very important
_Somewhat important
_Not at all important
 Our total MAS scores and average heart rate were only
correlated at -.04; correlational relationships can have
a magnitude from 0-1 but also a direction (+ or -)
Correlation between Heart Rate and MAS Score
A
Linear Regression
A
A
Heart_rate
45.00
A
40.00
A
A A
A
A
A
AA
A
A
A
Good construct
validity? Why?
A
A
A
A
Heart_rate = 40.04 + -0.02 * MAS_totalA
R-Square = 0.00
A
A
A
A
35.00
A
A
A
A
30.00
10
20
MAS_total
30
Good predictive
validity? Why?
Beginning APA style for your proposal
Author last name, author first and middle initials. (Year
published). Title of article. Title of Journal, Volume
number (Issue number), pg.-pg.
Morelli, G. A., Rogoff, B., Oppenheim, D., & Goldsmith, D.
(1996). Cultural variations in infants’ sleeping
arrangements: Questions of independence.
Developmental Psychology, 28(4), 604-613.
Put the following three articles into an APA style reference
Three APA references
Samuolis, J., Layburn, K., & Schiaffino, K. M.
(2001). Identity development and attachment
to parents in college students. Journal of
Youth and Adolescence, 30 (3), 373-383.
Tripodi, S. J., Bender, K., Litschge, C., & Vaughn,
M. G. (2010). Interventions for reducing
adolescent alcohol abuse: A meta-analytic
review. Archives of Pediatric and Adolescent
Medicine, 164 (1), 85-91.
Download