Reliability and Validity How do we use the words reliability and validity in everyday life? What do these words mean? Is there a difference between them or do they mean the same thing? RELIABILITY Reliability refers to the consistency of a measure. A measure is considered reliable if we get the same result repeatedly. A research method is considered reliable if we can repeat it and get the same results. Coolican (1994) pointed out “Any measure we use in life should be reliable, otherwise it’s useless. You wouldn’t want your car speedometer or a thermometer to give you different readings for the same values of different occasions. This applies to psychological measures as much as any other.” When assessing the reliability of a study, we generally need to ask two questions 1) Can the study be replicated? 2) If so, will the results be consistent? A ruler for example would be reliable, as the results could be replicated time after time and the same results would be gained (consistency). If you measure the length of a book on Monday, and your ruler tells you its 25 cm long, it will still tell you its 25cm long on Friday. An IQ test however may be unreliable, if a person sits the test on Monday and scores 140, and then sits the same test on Friday and scores 90. Even though it can be replicated, it shows low consistency and therefore is an unreliable test. Some research methods (such as laboratory studies) have high reliability as they can be replicated and the results checked for consistency. Other research methods however (such as case studies and interviews) have lower reliability as they are difficult or impossible to replicate. As they cannot be replicated, we cannot check how consistent the results are. How can we measure reliability? There are several different ways to estimate or improve reliability depending on the research method used. Match the method of estimating reliability to the description (pg165) Test-Retest reliability Split Half Reliability Inter-Rater reliability If the measure depends upon interpretation of behaviour, we can compare the results from two or more raters. Splitting a test into two halves, and comparing the scores in both halves The measure is administered to the same group of people twice If the results in the two halves are similar, we can assume the test is reliable If the results on the two tests are similar, we can assume the test is reliable If there is high agreement between the raters, the measure is reliable We will look in more detail of the specific reliability of various research methods throughout the course. VALIDITY A study may be high in reliability, but the results may still be meaningless if we don’t have validity. Validity is the extent to which a test measures what it claims to measure. There are three main aspects of validity that we investigate in psychological research Control, Realism and Generalisability.(p138) Control This refers to how well the experimenter has controlled the experimental situation. Control is important as without it, researchers can not establish cause and effect relationships. In other words, without control, we cannot state that it was the independent variable (IV) which caused the change in the dependant variable (DV). The result could have been caused by another variable, called an extraneous variable (EV). These are variables which have not been controlled by the experimenter, and which may affect the DV (see below). Realism The whole point of psychological research is to provide information about how people behave in real life. If an experiment is too controlled, or the situation too artificial, participants may act differently than they would in real life. Therefore, the results may lack validity. The term mundane realism is used to refer to how well an experiment reflects real life. If an experimental situation has high mundane realism (in other words, it reflects real life) it would be high in _______________________ validity Can you see a potential conflict between control and realism? Generalisability The aim of psychological research is to produce results which can then be generalised beyond the setting of the experiment. If an experiment is lacking in realism we will be unable to generalise. However, even if an experiment is high in realism, we still may not be able to generalise. For example, the participants may be all from a small group of similar people, meaning low population validity. Many experiments use white, middle class American college students as participants. What issues with generalisability can you think of? TYPES OF VALIDITY Experimental Validity: is the study really measuring what it intends? INTERNAL VALIDITY refers to things that happen “inside” the study. Internal validity is concerned with whether we can be certain that it was the IV which caused the change in the DV. If aspects of the experimental situation lack validity, the results of the study are meaningless and we can make no meaningful conclusions from them. Internal validity can be affected by a lack of mundane realism. This could lead the participants to act in a way which is unnatural, thus making the results less valid. Internal validity can also be affected by extraneous variables (see below). EXTRANEOUS VARIABLE HOW DOES IT AFFECT VALIDITY? HOW CAN IT BE OVERCOME? Situational variables (anything to do with the environment of the experiment): time of day, temperature, noise levels etc Something about the situation of the experiment could act as an EV if it has an effect on the DV. For example, poor lighting could affect participants performance on a memory test Participants variables (anything to do with differences in the participants): age, gender, intelligence, skill, past experience, motivation, education etc. It may be that the differences between the participants cause the change in the DV. For example, one group may perform better on a memory test than another because they are younger, or more motivated. Investigator effects: this refers to how the behaviour and language of the experimenter may influence the behaviour of the participants. The way in which an experimenter asks a question might act as a cue for the participant. Also known as experimenter bias Demand characteristics: participants are often searching for cues as to how to behave in an experiment. There could be something about the experimental situation or the behaviour of the experimenter (see investigator effects) which communicates to the participant what is “demanded” of them. Participant effects: participants are aware that they are in an experiment, and so may behave unnaturally. Leading questions from the experimenter may consciously or unconsciously alter how the participant responds. For example, the experimenter may provide verbal or non verbal encouragement when the participant behaves in a way which supports the hypothesis. The structure of the experiment could lead the participant to guess the aim of the study. For example, participants may perform a memory test, be made to exercise, and then given another memory test. This may lead the participants to guess that the study is about the effect of exercise on memory, which may cause them to change their behaviour They may be overly helpful and want to please the experimenter. This leads to artificial behaviour. Alternatively, they may decide to go against the experimenter’s aims and deliberately act in a way which spoils the experiment. This is the “screw you” effect. Situational variables can be overcome by the use of standardised procedures which ensure that all participants are tested under the same conditions. Participant variables can be completely removed by using a repeated measures design (the same participants are used in each condition). Matched pairs (participants in each group are matched) could also be used. Investigator effects can be overcome by using a double blind technique. This is when the person who carries out the research is not the person who designed it. When designing a study, it is important to try and create a situation where the participants will not be able to guess what the aim of the study is. Again, by designing a study so that the participants cannot guess the aims, participant effects can be reduced. TASKS A. A researcher wants to test whether people’s memories are better in the evening or in the morning. He gives a group of participants a memory test at 9am, and another test at 9pm. The researcher discovers that they scored higher in the morning. He concludes therefore that people’s memories are better in the morning. Name the IV:_________________________ Name the DV:____________________________ Name any extraneous variables that could have altered the DV? How could these EVs have been controlled? B. A psychologist is interested in the effect of age on how well people cope under stressful conditions. Two groups of participants are used, one group are under 25, and another group are over 50. Both groups are asked to sit a difficult exam under timed conditions. After the exam, all of the participants are given a questionnaire to assess how much stress they felt. The older people reported more stress. Name the IV:__________________________ Name the DV:___________________________ Name any extraneous variables that could have altered the DV? How could these EVs have been controlled? EXTERNAL VALIDITY Read pg 165-166 and fill in the gaps Assuming that our experiment has high ____________________ validity (that we can be sure that the DV was changed by the _____ and not an _____), we need to assess how well our results can be _________________________ beyond the experimental setting. Two issues here are how much ecological validity the study has, and whether it has population validity. Ecological validity refers to how well the experimental situation reflects _________ __________, and therefore how well the results can be __________________________ to other places and settings. Ecological validity can be assessed by looking at the ________________ of the experiment. For example, a field experiment takes place in the participant’s own environment, which would lead to ____________ ecological validity, as it is more naturalistic than a _____________________ experiment. _____________ _______________ on the other hand looks at the tasks that the participants have to do and how realistic these are. If the things that the participants are asked to do in the experiment are artificial and contrived, the study would be said to have ______ _______________ ________________ and therefore _______ ecological validity. Population validity refers to how well the ____________________ used in the experiment represent the general population. Many psychological studies use white, middle class male American students. Can we legitimately take the results from these participants and apply them to other nationalities, _______________, _______, or even different historical periods? Validity of psychological measures: how valid is the tool we use to measure? When designing an experiment in psychology, we will need to decide upon a way to measure our variables. If what we are measuring is height, weight, or time for example we could use a tape measure, scales or stopwatch respectively. However, what about if we want to measure something like self esteem, intelligence, conformity or linguistic ability? These psychological concepts need to be turned into numbers that can be measured and compared. The term for this is operationalisation. To create a measure, we first must define what it is we are measuring. For example, with intelligence, we need to decide what we mean by intelligence and what sort of things we wish to measure. We then decide upon a way to measure this (operationalising). Examples of the types of measures used in psychology are: A test which is given to the participants which produces a score A questionnaire or interview A checklist where participant’s behaviour can be recorded A biological response (e.g. body temperature, hormone levels) A possible issue with this is that by breaking down a concept into a numerical form, we lose validity and we end up not measuring what we intended. However, there are a number of ways we can assess the validity of a measure. Content Validity Concurrent validity Construct validity Predictive validity Does the method used actually seem to measure what you intended? For example, does an IQ test actually measure levels of intelligence, or is it measuring ability to solve puzzles? To ensure content validity, a panel of experts (on IQ for example) may be asked to assess the measure for validity. How well does the measure agree with existing measures? For example, does our IQ test agree with established tests of IQ? We can ensure concurrent validity by testing participant with both the new test and the established test. If our test has concurrent validity, there should be high agreement between the scores on both measures. Is the method actually measuring all parts of what we are aiming to test? For example, if we use a maths test to test intelligence, we are missing out on other factors involved such as linguistic ability or spatial awareness. To maintain construct validity, we need to define what it is we are aiming to measure, and ensure that all parts of that definition are being measured. Is our measure associated with future behaviour? For example, if someone scores high on our IQ test, we would expect them to perform well in GCSE exams, or do well in their career. This is similar to concurrent validity. We can investigate predictive validity by following up our participants to see if future performance is similar to performance on our measure. TASKS C. A researcher is looking into the effect of alcohol consumption on self esteem. He develops a questionnaire to assess people’s attitudes towards themselves. How could you see if this questionnaire had content validity? D. An experimenter creates a questionnaire that measures homophobic attitudes. How would you see if this test had construct validity? E. A researcher wants to see if people who live healthy lifestyles have better romantic relationships. He develops a checklist of what constitutes healthy behaviour. How do we know if this checklist has concurrent validity? Types of Validity Experimental Validity Internal Validity Extraneous variables Validity of psychological measures External Validity Mundane realism Ecological validity Situational Variables Participant Variables Investigator effects Demand characteristics Participant effects Content Validity Population validity Concurrent validity Construct validity Predictive validity