Research Methods Workshop: Reliability & Validity

advertisement
Psy 1191 Intro to Research Methods
Dr. Launius
Research Methods Workshop: Reliability & Validity
When you do research, you want to make sure that you choose reliable and valid measures. When you evaluate
published studies, a central component of your evaluation is whether the investigators used measures with good
reliability and validity. Your knowledge of these two concepts is one of the most important tools you’ll have as a
scientist.
What does this have to do with research?
You will face many important decisions that will affect people’s lives if you become a professional psychologist. For
example:
 Is this person clinically depressed?
 How does the organizational structure of this company influence worker morale?
 What factors are most predictive of juvenile delinquency?
To answer each of those questions you need to have reliable and valid measures. Reliability and validity are two
very different ideas, so make sure you can tell the differences between them. First, understand this: psychologists
measure things they believe exist in people's heads. Pretty crazy. Think about it:
 Have you ever seen a person's self-esteem?
 Can you get out a ruler and measure someone's intelligence?
 Are you extroverted? Then where exactly is your "extroversion" located? About three inches in from your
right eye and then one inch over to the left?
Moral: Psychologists measure things that we cannot see, but we believe them to exist inside your head. We had
better be able to support that belief. To do it, we rely on reliability and validity. Let's take a look.
Reliability: What would you do with a bathroom scale that gave you a different weight every time you stood on it?
You'd throw it out. In the same way, if you are going to have any faith in a psychological measure (like intelligence
or extroversion), then you at least have to get the same score (or at least something close) each time you give the
test to someone. That's reliability. You gotta have it or no one will believe that you know what you are talking about.
What about all the items on a standardized test that are supposed to measure some construct like mathematical
ability? You would think that all the items would tend to agree with each other. One subset of items shouldn't say
you stink in math while another subset says that you are great. This is another form of reliability. Ever take a survey
over the phone and they seem to ask you the same question twice? They are checking to see if you are reliable.
Based on examples, how would you define reliability?
There are different kinds of reliability. Here’s a graphic to help you organize and remember these important ideas:
Psy 1191 Intro to Research Methods
Dr. Launius
Statistical Issues: The actual statistics used to test reliability can be quite complex. However, the ideas are
simple and are just forms of correlation and regression. Say you have a test that measures a personality trait. You
would like all the items to give you consistent information about the trait. How could you do that?
Here's a clever idea - let's take half the items, compute your score and take the other half of the items and
compute a separate score for each. If you found a high Pearson's correlation coefficient between these split
halves then it would look like the two parts of the test agree with each other. The whole test would seem to have
good internal consistency or reliability.
The Spearman-Brown Split Half Coefficient (rsb):
Take your scale or test and divide it in some random manner into two halves. If the sum scale were reliable, you
would expect that the two halves would have an r close to 1.0.
Cronbach's Alpha (a)
You might see a problem with the Spearman-Borwn (rsb) split half correlation in that you picked two halves at
random. Why not try to take into account all possible split halves. Wouldn't that you give you a better estimate?
In fact, that is done by Cronbach's Alpha: Cronbach's Alpha (a) is preferred to r sb .
The s2i indicates the variances for the k individual items; s2sum indicates the variance for the sum of all items.
Bottom line: If a is close to 1.0 your test items are reliable. Programs can calculate this for you.
Test-retest and inter-rater reliabilities are typically calculated with simple Pearson correlation coefficient.
Validity: Having subjects respond reliably on a measure is a great start, but there is another concept you need to
get down really well. That’s validity. There are many kinds of validity, but they all refer to whether or not what you
are manipulating, or what you are measuring, truly reflects the concept you think it does.
Here’s a crazy (but true) example:
Many years ago, people used to believe that if you had a large brain then you were intelligent. Suppose you went
around and measured the circumference of your friend's heads because you also believed this theory, (they’d know
for sure that you're a psychology major now). Is the size of a person’s head a reliable measure (Think first!)? The
answer is YES. If I measured the size of your head today and then next week, I would get the same number.
Therefore, it is reliable. However, the whole idea is wrong! Because we now know that larger headed people are
not necessarily smarter than smaller headed ones, we know that the theory behind the measure is invalid.
How do we establish validity?
To make statements that you can have confidence in means establishing validity. There are several kinds. Let's do
a quick review of three common types of validity:
 Internal
 External
 Construct
1. Internal Validity: When you think about internal validity, think Inside the experiment. Is your experiment so well
designed that when the results are in, you feel confident that you can make truthful and definite statements about
what happened in your study? If your study is relatively free of confounds; you will have high confidence in its
results. That's internal validity.
2. External Validity: When you think about external validity, think Outside the experiment. Can your results be
generalized to people outside of your study? Whether external validity is high or low depends on what you are
studying and what your subjects are like. Just consider it carefully; will people not selected for your study react the
same way as those in your study?
Psy 1191 Intro to Research Methods
Dr. Launius
3. Construct Validity: When you think about construct validity, think Concept. You are manipulating and
measuring many concepts in your study – are you really tapping into these concepts?
Independent Variable:
Suppose you wanted to study the effects of violent TV on aggression in children. Your first step is to decide which
TV shows contain "violence". Whatever show you pick, you should first make sure that children perceive what they
see in the show as violence. Is there a difference between cartoon violence and violence that involves real people?
You have to think these issues through to make sure that you are truly (validly) manipulating the concept (or
"construct" - really the same term) of a violent TV show.
Dependent Variable:
Suppose you decide (as early researchers did) to measure the "number of times a child hits a bobo doll" as your
dependent measure of aggression (a bobo doll is that inflatable doll that bounces back up when you hit it). Now ask
yourself: when a child hits a bobo doll, is this because of aggressive tendencies? If the answer is no (because
hitting a bobo doll is simply a way of playing with it), then the number of hits on the doll would not accurately
measure the construct of "aggression". You want to measure something that truly reflects the construct (concept)
that you think is an expression of what is inside people's heads. Here’s a graphic to help you organize and
remember these important ideas:
Reliability and validity are very easy conceptually but they strike to the heart of Psychology as a useful
discipline. If our measurements differ from time 1 to time 2 to time 3 and we measure things that are not
useful, then who cares?
Download