Jamie DeLeeuw, Ph.D. 5/7/13

advertisement
Jamie DeLeeuw, Ph.D.
5/7/13
Reliability
Consistency of measurement. The measure itself is
dependable.
***A measure must be reliable to be valid!***
 High reliability = greater consistency = lower randomness (error)
 Weight scale
 Low reliability = less consistency = more error
 Error: Can come from observer, the way an item’s phrased, time
of day, etc.
 0-1.0 scale
 Solution? Measure the construct multiple ways (helps cancel out
error).
Types of Reliability
1.) Internal reliability/consistency: Consistency within a set of
items intended to measure the same construct.
 Multiple ?s to assess one construct
 Highly reliable scale = people’s responses to the items are highly
intercorrelated/consistent.
 Cronbach’s alpha/KR-20, Split-half reliability
 .7 is acceptable for research; more lenient depending on grading
purposes (construct of “knowledge”)
Animal Attitudes Scale (partial)
 Wild animals, such as mink and raccoon, should not be trapped and their skins made
into fur coats.
 There is nothing morally wrong with hunting wild animals for food.
 I think people who object to raising animals for meat are too sentimental.
 Much of the scientific research done with animals is unnecessary and cruel.
 Basically, humans have the right to use animals as we see fit.
 Continued research with animals will be necessary if we are to ever conquer diseases such
as cancer, heart disease, and AIDS.
 It is unethical to breed purebred dogs for pets when millions of dogs are killed in animal
shelters each year.
 The production of inexpensive meat, eggs, and dairy products justifies maintaining
animals under crowded conditions.
Types of Reliability
 2.) Inter-rater reliability: Consistency in judgments across
multiple raters.
 Olympics
 % agreement
 Fleiss Kappa controls for chance agreement; > .6 is
“good”
 Rubrics are a step in the right direction

Writing vs. content
 3.) Test-retest reliability: Consistency or stability of the test
across time (multiple administrations).


CJ performance
Magazine quizzes = low reliability
Which type of reliability seems
easiest to establish?
Types of Validity
Does it measure what it’s supposed to measure? Accuracy of the
inferences, interpretations, or actions made on the basis of test scores
(Messick, 1989).
Construct: The accuracy w/ which a measure reflects the
underlying construct.

*Content: Whether items/questions represent the construct.
Face: Does the scale look like it measures what it’s supposed to?

Criterion: Examines how well a measure correlates with a standard
of comparison (criterion) or predicted behavior.



Predictive: The extent to which a measure correlates with an
individual’s future behavior.
Concurrent: ……….. current behavior.
Discriminant: The degree to which a scale does NOT measure
unintended qualities.
Construct Validity
 The accuracy w/ which a measure reflects the underlying
construct (e.g. personality, love, need for cognition)
 Indicates a match between conceptual and operational
definitions
 Researchers try to figure out critical components of the conceptual
definition and include them in the measure.
 Many potential operational definitions per concept.
 Ex.: Empathy, poverty, aggression
 Most important type of validity for hypothesis testing
 Other types of validity help establish construct validity.
Criterion Validity
 Examines how well a measure correlates with a standard of
comparison (criterion) or predicted behavior.
 Concurrent and predictive
 Ex: Does a measure of math ability predict how well a
person will do in an engineering-based profession
(predictive)?
 Ex: Does a depression scale correlate with behavioral
observations of depressed individuals (concurrent)?
 Ex: Does the self-esteem scale predict who will volunteer
answers in class (concurrent)?
Issue: Need to make sure the criterion is a good reflection
of the construct!
Discriminant Validity
 Indicates that a scale does NOT correlate with other
assessment devices presumed to measure conceptually
dissimilar constructs.
 Self-esteem vs. narcissism (r = .26)
 Also helps alleviate the 3rd variable issue
 Kids with bigger feet (shoe size) have stronger reading
skills.
 If age isn’t correlated with either…
Content Validity
 Judgment by experts of the degree to which items,
tasks, or questions on a test adequately represent the
construct.
 Ex: Grief
 Matches study guide? Course objectives/outcomes?
 Includes “face validity”


Construct
Item
Depression
Optimism
Do you often feel sad or blue?
Do you generally expect good things
to happen?
Classic Representation of Reliability and
Validity
Not Reliable
Not Valid
Reliable
Not Valid
Reliable
Valid
Must be reliable to be valid!
Culture and Validity
 Important questions:
 Does the construct exist in all cultures?
 Are items interpreted the same in each culture?


Language, translation
Essay vs. MC
‘I’d rather vacation at a popular beach than an isolated cabin
in the woods’ -- SES
Challenges to Validity (in Research)
 Response sets
 Acquiescence: tendency to say ‘yes’


Dealt with by using positive and negatively worded items
“I tend to be alert”, “I usually don’t feel very energetic”
 Social desirability: tendency to portray self positively

Dealt with by
 Making social desirability less salient (phrasing, experiment)
 Measure and correct for social desirability
Download