Chapter 5 Measurement and Sampling Psychological Concepts • Measuring complex concepts – Psychological concepts are often abstract – We create operational definitions to measure these complex, abstract concepts – Our operational definitions have to make sense for the research questions we want to answer • Operational definitions – Our measurements represent the concepts that we cannot observe directly Defining and Measuring Variables • Operational definition—A working definition of a concept that is based on how we measure it • Variable—An element that, when measured, can take on different values (e.g., intelligence test scores) • Hypothetical construct—A concept that helps us understand behavior but that is not directly observable Defining and Measuring Variables • Example: One measure of stress? • Score on the Social Readjustment Rating Scale – 43 items on the scale – Different number of stress units for different life events; greater numbers of points reflect greater stress (and likelihood of illness) • • • • Death of a spouse: 119 points Jail term: 79 points Change in schools: 35 points Christmas: 30 points • Add points on the 43 items to get stress level • The score on the scale represents the underlying hypothetical construct of stress Defining and Measuring Variables • The importance of culture and context in defining variables – The Social Readjustment Rating Scale may not be good for some populations, like young college students – A different scale includes 51 items, including roommate problems, maintaining a steady dating relationship, attending a football game • It is critical to understand the people who you are measuring Multiple Possible Operational Definitions • You can measure constructs in a variety of ways, depending on your research. • Considering stress: – Physiological measurements (cortisol level in the bloodstream) – Questionnaire scores • You choose your operational definition depending on the nature of your research question and on practical issues Probability Sampling • Probability sampling—Set of sampling methods in which every person in the population has a specified probability of being selected • Generalization—Applying results of research to an entire population – Probability sampling permits researchers to generalize from their sample to the population because the means of selection does not bias toward including or excluding people, so the sample is likely to represent the entire population Probability Sampling Types of Probability Samples Simple Random Sampling—Each person in the population has the same chance of being included in the research sample Systematic Sampling—Selection process involving an unbiased approach that is not truly random (e.g., selecting every 10th person on a list) Stratified Random Sampling--Selecting samples in which groups of interest (e.g., males and females) are identified and selected so the groups are represented in a desired proportion in the sample. Cluster Sampling—Sampling in which a number of groups (i.e., clusters) are identified, and a certain number of clusters are randomly selected for participation in the research Nonprobability Sampling • Nonprobability sampling—Sampling that relies on groups of people who are convenient or available to participate • Nonsampling error—Problem with nonprobability sampling in which some members of the population are systematically excluded from participation • Problem with nonprobability sampling—It is not clear to whom the results will generalize because the sample is idiosyncratic Nonprobability Sampling Types of Nonprobability Sampling Convenience Sampling —Nonrandom sampling involving whoever happens to be available to participate in the research (also called haphazard or accidental sampling) –Most psychological research involves convenience sampling –This type of sampling is easy and practical –For some types of research, convenience samples are likely to mirror the population, but the researcher has to use judgment in deciding this Purposive (Judgmental) Sampling—A sampling method in which participants are chosen because they possess some desirable trait (e.g., high creativity) Chain-referral Sampling—A sampling method in which the researcher identifies a potential participants who, in turn, identifies another participant, who then also identifies somebody else, and so on. Making Useful Measurements • Reliability—A characteristic of data related to the consistency of a the measurement • Validity—A characteristic of data related to how useful a measurement is for the intended purpose • Measurement error—An error in measurement due to poor measuring instruments or humor error, which can lead to poor conclusions Making Useful Measurements • The relation between reliability and validity – Reliability simply means consistency but does not indicate how useful the measurements are – The validity level of measurements is limited by how reliable they are (e.g., low reliability guarantees low validity) – If measurements are reliable, they might be valid – If measurements are not reliable, they cannot be valid – If measurements are valid, they must be reliable Making Useful Measurements Types of Reliability Test-retest reliability—Consistency of scores on measurements taken at two different times Split-half reliability—Consistency of scores on a measurement device (e.g., a test) when the scores are put in subgroups and the subgroups are compared Interrater reliability—Consistency of measurements by different observers (also called interobserver reliability) Considering Validity in Research Types of Validity Construct validity—The degree to which a measurement gives a good indication of the construct a researcher is trying to measure Convergent validity—The extent to which two measurements that are supposed to measure the same construct are correlated Divergent validity—The extent to which two measurements that are supposed to be unrelated are actually uncorrelated Internal validity—The degree to which a research design leads to conclusions in which a researcher has confidence (associated with random assignment of participants in an experiment) External validity—The degree to which a researcher can generalize the results of a study to a larger population (associated with random selection of participants from a population) Statistical conclusion validity—The degree to which statistical analyses lead to good conclusions Construct Validity • Construct validity: Is our measurement appropriate for what we are trying to measure? • The Beck Depression Inventory has acceptable construct validity for people from many cultures – Mexican – Portuguese – Arabic – American Construct Validity • The Beck Depression Inventory does not have good construct validity for – Alzheimer’s patients – Seriously depressed patients – Some people with chronic disease Construct Validity • The Beck Depression Inventory (like all measurements) may have good construct validity in some situations but not in all. Convergent Validity • Measurements that correlate when they should correlate – Sometimes they should correlate positively – Sometimes they should correlate negatively Divergent Validity • Measurements do not correlate (either positively or negatively) when there is no reason to expect that they should. Internal Validity Internal Validity: Can you identify the most likely cause and rule out alternative explanations in understanding behavior – Random assignment of participants to experimental groups increases the likelihood that internal validity will be high – Random assignment is useful is permitting researchers to draw cause-and-effect conclusions How to Randomly Assign 91477 09496 03549 19981 51444 66281 08461 36070 28751 64061 29697 48263 90503 55031 89292 05254 61412 12377 01486 22061 90242 22662 41995 34220 10273 35219 53378 52392 54443 10746 59885 34601 06394 48623 90035 96901 13522 67053 10873 84070 07389 56490 61978 53407 04758 38055 80778 49965 02586 71531 1. Go through a random number table and write down the numbers from 1 to N (your sample size) in the order in which they occur in the table. 2. Pair each person with the random numbers as they occur. 3. Put each person paired with an odd number into Group 1 and each person paired with an even number into Group 2. External Validity • Are your measurements relevant for – Other settings? – Other people? – Other times? Statistical Conclusion Validity Are your statistics appropriate to answer your research question? • Scales of measurement – Some researchers place importance on scales of measurement in determining statistical tests – Some researchers think that scales of measurement are generally unimportant • There are controversies regarding the value of null hypothesis statistical testing The SAT: Questions of Reliability and Validity The SAT is fairly reliable, but is it valid? • That is, people taking the SAT on different occasions generally score about the same each time • Does your SAT score predict your grades in college? • In general, there is a reasonably high (but not perfect) correlation between SAT scores and college grades (r > .50), suggesting that it shows a degree of construct validity Controversy: The Head Start Program How effective is the Head Start Program? • How do you operationally define effective? • Gains in IQ scores by children in Head Start are not long lasting, so by this measure, Head Start is not effective • Head Start children show higher grades and graduation rates than children who did not participate in Head Start, so by this measure, Head Start is effective Controversy: The Head Start Program • Most, but not all, studies show long-term gains due to Head Start • Complex questions like this are very difficult to answer • To reach sound conclusions, we have to evaluate how adequate (i.e., valid) the measurements are, weigh all the evidence, and think critically about the issue Scales of Measurement Scales of Measurement Nominal scales—Measurements that involve putting observations into categories, without numerical values (e.g., left-handed, right-handed, and ambidextrous) Ordinal scales– Measurements that differentiate only by identifying whether a measurement is larger or smaller than another, resulting in ranked data. Interval scales—Measurements that involve absolute differences between scores. Ratio scales—Measurements that involve absolute differences between scores and also proportional measurements (e.g., A is twice as big as B) Scales of Measurement • Some researchers believe that most statistical approaches in psychology require interval or ratio data • There is sometimes controversy about what scale applies to a given measurement. Scales of Measurement • Is IQ score nominal, ordinal, interval, or ratio? – IQ is not simply a set of categories, so it is not nominal – IQ scores let you say, “Person A has a higher score than Person B” so IQ scores are at least ordinal. – A difference between scores of 80 and 90 and between 130 and 140 represent equal differences in numerical values, so an IQ score appears to be at least interval, but psychologically, is the difference in function between people with scores of 80 and 90 equal to the difference between people with scores of 130 and 140? If not, is the scale really interval with respect to psychological differences? – If somebody has a score 10% higher than another person, it does not mean that the person is 10% smarter, so the scale may not be ratio. Is IQ score nominal, ordinal, interval, or ratio? • Some people have argued that IQ scores are really ordinal. • Others argue that IQ scores are interval. • In spite of the theoretical controversy, researchers treat measures like IQ scores as interval for purposes of data analysis.