Reliability and validity worksheet

advertisement
Reliability and Validity
How do we use the words reliability and validity in everyday life? What do these words mean? Is there a
difference between them or do they mean the same thing?
RELIABILITY
Reliability refers to the consistency of a measure. A measure is considered reliable if we get the same
result repeatedly. A research method is considered reliable if we can repeat it and get the same results.
Coolican (1994) pointed out
“Any measure we use in life should be reliable, otherwise it’s useless. You wouldn’t
want your car speedometer or a thermometer to give you different readings for the
same values of different occasions. This applies to psychological measures as much as
any other.”
When assessing the reliability of a study, we generally need to ask two questions
1) Can the study be replicated?
2) If so, will the results be consistent?
A ruler for example would be reliable, as the results could be replicated time after time
and the same results would be gained (consistency). If you measure the length of a
book on Monday, and your ruler tells you its 25 cm long, it will still tell you its
25cm long on Friday.
An IQ test however may be unreliable, if a person sits the test on Monday and scores 140, and
then sits the same test on Friday and scores 90. Even though it can be replicated, it shows low
consistency and therefore is an unreliable test.
Some research methods (such as laboratory studies) have high reliability as they can be replicated and
the results checked for consistency. Other research methods however (such as case studies and
interviews) have lower reliability as they are difficult or impossible to replicate. As they cannot be
replicated, we cannot check how consistent the results are.
How can we measure reliability?
There are several different ways to estimate or improve reliability depending on the research method
used. Match the method of estimating reliability to the description (pg165)
Test-Retest reliability
Split Half Reliability
Inter-Rater reliability
If the measure depends upon
interpretation of behaviour,
we can compare the results
from two or more raters.
Splitting a test into two
halves, and comparing the
scores in both halves
The measure is administered
to the same group of people
twice
If the results in the two
halves are similar, we can
assume the test is reliable
If the results on the two
tests are similar, we can
assume the test is reliable
If there is high agreement
between the raters, the
measure is reliable
We will look in more detail of the specific reliability of various research methods throughout the course.
VALIDITY
A study may be high in reliability, but the results may still be meaningless if we don’t
have validity. Validity is the extent to which a test measures what it claims to
measure.
There are three main aspects of validity that we investigate in psychological research
Control, Realism and Generalisability.(p138)
Control
This refers to how well the experimenter has controlled the experimental situation. Control
is important as without it, researchers can not establish cause and effect relationships. In
other words, without control, we cannot state that it was the independent variable (IV)
which caused the change in the dependant variable (DV). The result could have been
caused by another variable, called an extraneous variable (EV). These are variables which
have not been controlled by the experimenter, and which may affect the DV (see below).
Realism
The whole point of psychological research is to provide information about how people
behave in real life. If an experiment is too controlled, or the situation too artificial,
participants may act differently than they would in real life. Therefore, the results may lack
validity.
The term mundane realism is used to refer to how well an experiment reflects real life. If
an experimental situation has high mundane realism (in other words, it reflects real life) it would be high
in _______________________ validity
Can you see a potential conflict between control and realism?
Generalisability
The aim of psychological research is to produce results which can then be generalised beyond the
setting of the experiment. If an experiment is lacking in realism we will be unable to generalise.
However, even if an experiment is high in realism, we still may not be able to generalise.
For example, the participants may be all from a small group of similar people, meaning low population
validity. Many experiments use white, middle class American college students as participants. What
issues with generalisability can you think of?
TYPES OF VALIDITY
Experimental Validity: is the study really measuring what it intends?
INTERNAL VALIDITY refers to things that happen “inside” the study. Internal validity is concerned with
whether we can be certain that it was the IV which caused the change in the DV. If aspects of the
experimental situation lack validity, the results of the study are meaningless and we can make no
meaningful conclusions from them.
 Internal validity can be affected by a lack of mundane realism. This could lead the participants to
act in a way which is unnatural, thus making the results less valid.
 Internal validity can also be affected by extraneous variables (see below).
EXTRANEOUS VARIABLE
HOW DOES IT AFFECT VALIDITY?
HOW CAN IT BE OVERCOME?
Situational variables (anything to do
with the environment of the
experiment): time of day, temperature,
noise levels etc
Something about the situation of the
experiment could act as an EV if it has an
effect on the DV. For example, poor lighting
could affect participants performance on a
memory test
Participants variables (anything to do
with differences in the participants):
age, gender, intelligence, skill, past
experience, motivation, education etc.
It may be that the differences between the
participants cause the change in the DV. For
example, one group may perform better on a
memory test than another because they are
younger, or more motivated.
Investigator effects: this refers to how
the behaviour and language of the
experimenter may influence the
behaviour of the participants. The way in
which an experimenter asks a question
might act as a cue for the participant.
Also known as experimenter bias
Demand characteristics: participants are
often searching for cues as to how to
behave in an experiment. There could
be something about the experimental
situation or the behaviour of the
experimenter (see investigator effects)
which communicates to the participant
what is “demanded” of them.
Participant effects: participants are
aware that they are in an experiment,
and so may behave unnaturally.
Leading questions from the experimenter
may consciously or unconsciously alter how
the participant responds. For example, the
experimenter may provide verbal or non
verbal encouragement when the participant
behaves in a way which supports the
hypothesis.
The structure of the experiment could lead
the participant to guess the aim of the study.
For example, participants may perform a
memory test, be made to exercise, and then
given another memory test. This may lead the
participants to guess that the study is about
the effect of exercise on memory, which may
cause them to change their behaviour
They may be overly helpful and want to
please the experimenter. This leads to
artificial behaviour. Alternatively, they may
decide to go against the experimenter’s aims
and deliberately act in a way which spoils the
experiment. This is the “screw you” effect.
Situational variables can be
overcome by the use of
standardised procedures which
ensure that all participants are
tested under the same
conditions.
Participant variables can be
completely removed by using a
repeated measures design (the
same participants are used in
each condition). Matched pairs
(participants in each group are
matched) could also be used.
Investigator effects can be
overcome by using a double
blind technique. This is when
the person who carries out the
research is not the person who
designed it.
When designing a study, it is
important to try and create a
situation where the participants
will not be able to guess what
the aim of the study is.
Again, by designing a study so
that the participants cannot
guess the aims, participant
effects can be reduced.
TASKS
A. A researcher wants to test whether people’s memories are better in the evening or in
the morning. He gives a group of participants a memory test at 9am, and another test
at 9pm. The researcher discovers that they scored higher in the morning. He
concludes therefore that people’s memories are better in the morning.
Name the IV:_________________________ Name the DV:____________________________
Name any extraneous variables that could have altered the DV?
How could these EVs have been controlled?
B. A psychologist is interested in the effect of age on how well people cope under
stressful conditions. Two groups of participants are used, one group are under 25,
and another group are over 50. Both groups are asked to sit a difficult exam under
timed conditions. After the exam, all of the participants are given a questionnaire to
assess how much stress they felt. The older people reported more stress.
Name the IV:__________________________ Name the DV:___________________________
Name any extraneous variables that could have altered the DV?
How could these EVs have been controlled?
EXTERNAL VALIDITY
Read pg 165-166 and fill in the gaps
Assuming that our experiment has high ____________________ validity (that we can be sure that the
DV was changed by the _____ and not an _____), we need to assess how well our results can be
_________________________ beyond the experimental setting. Two issues here are how much
ecological validity the study has, and whether it has population validity.
Ecological validity refers to how well the experimental situation reflects _________
__________, and therefore how well the results can be __________________________
to other places and settings. Ecological validity can be assessed by looking at the
________________ of the experiment. For example, a field experiment takes place in the
participant’s own environment, which would lead to ____________ ecological validity, as
it is more naturalistic than a _____________________ experiment. _____________
_______________ on the other hand looks at the tasks that the participants have to do
and how realistic these are. If the things that the participants are asked to do in the
experiment are artificial and contrived, the study would be said to have ______
_______________ ________________ and therefore _______ ecological validity.
Population validity refers to how well the ____________________ used in the
experiment represent the general population. Many psychological studies use white,
middle class male American students. Can we legitimately take the results from these
participants and apply them to other nationalities, _______________, _______, or
even different historical periods?
Validity of psychological measures: how valid is the tool we use to measure?
When designing an experiment in psychology, we will need to decide upon a way to
measure our variables. If what we are measuring is height, weight, or time for
example we could use a tape measure, scales or stopwatch respectively. However,
what about if we want to measure something like self esteem, intelligence,
conformity or linguistic ability? These psychological concepts need to be turned into
numbers that can be measured and compared. The term for this is operationalisation.
To create a measure, we first must define what it is we are measuring. For example, with intelligence,
we need to decide what we mean by intelligence and what sort of things we wish to measure. We then
decide upon a way to measure this (operationalising).
Examples of the types of measures used in psychology are:
 A test which is given to the participants which produces a score
 A questionnaire or interview
 A checklist where participant’s behaviour can be recorded
 A biological response (e.g. body temperature, hormone levels)
A possible issue with this is that by breaking down a concept into a numerical form, we lose validity and
we end up not measuring what we intended. However, there are a number of ways we can assess the
validity of a measure.
Content
Validity
Concurrent
validity
Construct
validity
Predictive
validity
Does the method used actually seem to measure what you intended? For example, does an
IQ test actually measure levels of intelligence, or is it measuring ability to solve puzzles?
 To ensure content validity, a panel of experts (on IQ for example) may be asked to
assess the measure for validity.
How well does the measure agree with existing measures? For example, does our IQ test
agree with established tests of IQ?
 We can ensure concurrent validity by testing participant with both the new test and
the established test. If our test has concurrent validity, there should be high
agreement between the scores on both measures.
Is the method actually measuring all parts of what we are aiming to test? For example, if
we use a maths test to test intelligence, we are missing out on other factors involved such
as linguistic ability or spatial awareness.
 To maintain construct validity, we need to define what it is we are aiming to
measure, and ensure that all parts of that definition are being measured.
Is our measure associated with future behaviour? For example, if someone scores high on
our IQ test, we would expect them to perform well in GCSE exams, or do well in their career.
This is similar to concurrent validity.
 We can investigate predictive validity by following up our participants to see if future
performance is similar to performance on our measure.
TASKS
C. A researcher is looking into the effect of alcohol consumption on self esteem. He develops a
questionnaire to assess people’s attitudes towards themselves. How could you see if this
questionnaire had content validity?
D. An experimenter creates a questionnaire that measures homophobic attitudes. How would you
see if this test had construct validity?
E. A researcher wants to see if people who live healthy lifestyles have better romantic
relationships. He develops a checklist of what constitutes healthy behaviour. How do we know
if this checklist has concurrent validity?
Types of Validity
Experimental Validity
Internal
Validity
Extraneous
variables
Validity of psychological measures
External
Validity
Mundane
realism
Ecological
validity
Situational Variables
Participant Variables
Investigator effects
Demand characteristics
Participant effects
Content
Validity
Population
validity
Concurrent
validity
Construct
validity
Predictive
validity
Download
Study collections