Psy 1191 Intro to Research Methods Dr. Launius Research Methods Workshop: Reliability & Validity When you do research, you want to make sure that you choose reliable and valid measures. When you evaluate published studies, a central component of your evaluation is whether the investigators used measures with good reliability and validity. Your knowledge of these two concepts is one of the most important tools you’ll have as a scientist. What does this have to do with research? You will face many important decisions that will affect people’s lives if you become a professional psychologist. For example: Is this person clinically depressed? How does the organizational structure of this company influence worker morale? What factors are most predictive of juvenile delinquency? To answer each of those questions you need to have reliable and valid measures. Reliability and validity are two very different ideas, so make sure you can tell the differences between them. First, understand this: psychologists measure things they believe exist in people's heads. Pretty crazy. Think about it: Have you ever seen a person's self-esteem? Can you get out a ruler and measure someone's intelligence? Are you extroverted? Then where exactly is your "extroversion" located? About three inches in from your right eye and then one inch over to the left? Moral: Psychologists measure things that we cannot see, but we believe them to exist inside your head. We had better be able to support that belief. To do it, we rely on reliability and validity. Let's take a look. Reliability: What would you do with a bathroom scale that gave you a different weight every time you stood on it? You'd throw it out. In the same way, if you are going to have any faith in a psychological measure (like intelligence or extroversion), then you at least have to get the same score (or at least something close) each time you give the test to someone. That's reliability. You gotta have it or no one will believe that you know what you are talking about. What about all the items on a standardized test that are supposed to measure some construct like mathematical ability? You would think that all the items would tend to agree with each other. One subset of items shouldn't say you stink in math while another subset says that you are great. This is another form of reliability. Ever take a survey over the phone and they seem to ask you the same question twice? They are checking to see if you are reliable. Based on examples, how would you define reliability? There are different kinds of reliability. Here’s a graphic to help you organize and remember these important ideas: Psy 1191 Intro to Research Methods Dr. Launius Statistical Issues: The actual statistics used to test reliability can be quite complex. However, the ideas are simple and are just forms of correlation and regression. Say you have a test that measures a personality trait. You would like all the items to give you consistent information about the trait. How could you do that? Here's a clever idea - let's take half the items, compute your score and take the other half of the items and compute a separate score for each. If you found a high Pearson's correlation coefficient between these split halves then it would look like the two parts of the test agree with each other. The whole test would seem to have good internal consistency or reliability. The Spearman-Brown Split Half Coefficient (rsb): Take your scale or test and divide it in some random manner into two halves. If the sum scale were reliable, you would expect that the two halves would have an r close to 1.0. Cronbach's Alpha (a) You might see a problem with the Spearman-Borwn (rsb) split half correlation in that you picked two halves at random. Why not try to take into account all possible split halves. Wouldn't that you give you a better estimate? In fact, that is done by Cronbach's Alpha: Cronbach's Alpha (a) is preferred to r sb . The s2i indicates the variances for the k individual items; s2sum indicates the variance for the sum of all items. Bottom line: If a is close to 1.0 your test items are reliable. Programs can calculate this for you. Test-retest and inter-rater reliabilities are typically calculated with simple Pearson correlation coefficient. Validity: Having subjects respond reliably on a measure is a great start, but there is another concept you need to get down really well. That’s validity. There are many kinds of validity, but they all refer to whether or not what you are manipulating, or what you are measuring, truly reflects the concept you think it does. Here’s a crazy (but true) example: Many years ago, people used to believe that if you had a large brain then you were intelligent. Suppose you went around and measured the circumference of your friend's heads because you also believed this theory, (they’d know for sure that you're a psychology major now). Is the size of a person’s head a reliable measure (Think first!)? The answer is YES. If I measured the size of your head today and then next week, I would get the same number. Therefore, it is reliable. However, the whole idea is wrong! Because we now know that larger headed people are not necessarily smarter than smaller headed ones, we know that the theory behind the measure is invalid. How do we establish validity? To make statements that you can have confidence in means establishing validity. There are several kinds. Let's do a quick review of three common types of validity: Internal External Construct 1. Internal Validity: When you think about internal validity, think Inside the experiment. Is your experiment so well designed that when the results are in, you feel confident that you can make truthful and definite statements about what happened in your study? If your study is relatively free of confounds; you will have high confidence in its results. That's internal validity. 2. External Validity: When you think about external validity, think Outside the experiment. Can your results be generalized to people outside of your study? Whether external validity is high or low depends on what you are studying and what your subjects are like. Just consider it carefully; will people not selected for your study react the same way as those in your study? Psy 1191 Intro to Research Methods Dr. Launius 3. Construct Validity: When you think about construct validity, think Concept. You are manipulating and measuring many concepts in your study – are you really tapping into these concepts? Independent Variable: Suppose you wanted to study the effects of violent TV on aggression in children. Your first step is to decide which TV shows contain "violence". Whatever show you pick, you should first make sure that children perceive what they see in the show as violence. Is there a difference between cartoon violence and violence that involves real people? You have to think these issues through to make sure that you are truly (validly) manipulating the concept (or "construct" - really the same term) of a violent TV show. Dependent Variable: Suppose you decide (as early researchers did) to measure the "number of times a child hits a bobo doll" as your dependent measure of aggression (a bobo doll is that inflatable doll that bounces back up when you hit it). Now ask yourself: when a child hits a bobo doll, is this because of aggressive tendencies? If the answer is no (because hitting a bobo doll is simply a way of playing with it), then the number of hits on the doll would not accurately measure the construct of "aggression". You want to measure something that truly reflects the construct (concept) that you think is an expression of what is inside people's heads. Here’s a graphic to help you organize and remember these important ideas: Reliability and validity are very easy conceptually but they strike to the heart of Psychology as a useful discipline. If our measurements differ from time 1 to time 2 to time 3 and we measure things that are not useful, then who cares?