The Basics of Experimentation Lecture Outline I. Independent variables A. Definition B. Manipulated vs. Subject II. Dependent variables III. Operational definition A. Experimental operational definitions B. Measured operational definitions C. Scales of Measurement D. Evaluation of operational definitions 1. Reliability 2. Validity i. face ii. content iii. predictive iv. construct IV. Evaluation of the experiment Internal Validity Extraneous variables Confounding variables Classic threats to internal validity History Maturation Testing Instrumentation Statistical regression Selection Subject mortality Selection interactions External Validity • ----------------------IN CLASS CONFOUNDS PROJECT ------------- • What is an experiment? • Smart-Pill Study – Independent Variable? – Dependent Variable? • Operational definition Independent variable vs. Subject variabls • Subject Variables – characteristics about our subjects that we might be interested in but we cannot randomly assign • • • • Age Height Weight Gender – not a true independent variable • not manipulated • AKA predictor variables • Both independent variables and dependent variables must be carefully defined – Operational Definitions • Experimental Operational Definitions – Operational Definition of IV • Measured Operational Definitions – Operational Definition of DV • How we operationally define our constructs can affect the outcome of a study. – Schacter (1959) – The effect of anxiety on the need to affiliate • Manipulated anxiety by telling subjects different stories about how much shock will hurt – Gp1 » Dr Gregor Zilstein » It is necessary for the shocks to be intense” – Gp 2 » The shock will be more like a tickle • Measured whether the subjects waited with others or alone • Sarnoff, and Zimbardo (1961) – Does anxiety cause a need to affiliate – Male subjects • we are studying the physiology of the mouth – High anxiety • Suck on Pacifier • Baby bottle • Breast shield – Low anxiety • Whistles • Kazoos • We also must operationally define nonconstruct variables – Drug exposure – not an abstract concept like anxiety • Amnt. • Route of administration – IV, IP, SC, oral • Injection volume • When we operationally define our measurement (DV) we also consider what scale of measurement we are using • Scales of measurement – Nominal – Ordinal – Interval – ratio • Nominal – This is the simplest level of measurement – Classification by name only – Classifies items into two or more distinct categories • You are either one or the other – You can be Democratic or Republican, • we don’t know how democratic or republican you are. – Gender – Ethnicity – Marital status – Jenni (1976) • Book carrying techniques – type1 - wrap one or both arms around the books with the short edges of the books resting on the hip or in front of the body – type 2 - support the books by one arm and hand at the side of the body, with the long edges of the books approximately horizontal to the ground. – Observed 2,626 individuals in six colleges in the US, one in Canada, and one in Central America. • 82% of females used type 1 • 96% of males used type 2 • ordinal • A higher scale value has more of that attribute. – The intervals between adjacent scale values are indeterminate. – Scale assignment is by the property of "greater than," "equal to," or "less than." • • • • Movie ratings (0, 1 or 2 thumbs up). U.S.D.A. quality of beef ratings (good, choice, prime). Gold, Silver, Bronze - olympics The rank order of anything. • Adds the arithmetic operation of greater than and less than • Interval – Intervals between adjacent scale values are equal • the difference between 8 and 9 is the same as the difference between 76 and 77. – Standard operations of addition, multiplication, subtraction, and division can be performed • Fahrenheit and Centigrade • Ratio • There is a rational zero point for the scale. • Ratios are equivalent – e.g., the ratio of 2 to same as the ratio of 8 to 4. • Examples – Degrees Kelvin. – Annual income in dollars. – Length or distance in centimeters, inches, miles, etc. • Allows Multiplication and division of scale values. • Ratio and interval scales tend to be preferred because the statistics that can be used on them tend to be more powerful – t-tests and Anova – vs. Chi Square • Evaluating Operational Definitions – Reliability vs. Validity • Precision vs. Accuracy • Throwing Darts - or Sighting a gun • Forms of Reliability – Reliable definition of your I.V. • let’s say you hypothesize that hungry rats will learn a maze more quickly (when food is the reward) than rats that aren’t hungry. • What is the IV? • What is the DV? • Hunger is a hypothetical construct – how can we make a reliable operational definition of this construct? • Reliable operational definitions of your DV – How can we be certain that our measurements always come out the same? • This can be examined in multiple ways – 1) Inter-rater reliability • Do different observers agree with the measurement • have 2 people make the observations and see if their observations are correlated. – should be high positive correlation • 2) Test-Retest reliability – If the same individual is measured twice with the same instrument – you should get similar results • GRE • IQ • 3) Inter-item reliability – individual items on a questionnaire designed to measure the same thing should be highly correlated • Cronbach’s Alpha you will calculate for the survey project • Validity – Are we manipulating the variable we intend manipulate? • Anxiety example Schacter vs. Zimbardo – Are we measuring the variable that we intend to measure? • IQ test – does it really measure intelligence? Forms of validity • Some Types of Validity • Face validity – The degree to which a manipulation or measurement is self-evident – Does it look like it should measure what it intends to? Does it just make sense? • Heart rate and anxiety? – This is a weak form of validity. • May be a problem if face validity is low though – subjects might not take it seriously • Content validity – The degree to which the content of a measure reflects the content of what is measured. • Students often question the content validity of exams – “You only asked me about the things I didn’t know.” – “You didn’t ask me about the stuff I did know.” • Predictive validity – The degree to which a measure, definition, or experiment yields information allowing prediction of actual behavior or performance • If people do well on the IQ test and it is really getting at intelligence, we would expect them to do well at other tasks that require intelligence. – School – Other tests – Difficult jobs • Construct validity – Is the entire measure valid? – The degree to which an operational definition accurately represents the construct it is intended to manipulate or measure AND ONLY that construct Construct validity continued • Does the test discriminate between constructs – Lets say I designed a new test to predict how well adjusted people are • The Kaiser attitude adjustment survey – but it also measures verbal aptitude? • In other words people’s verbal aptitude score is highly correlated with their Kaiser attitude adjustment survey score – Problem? • Evaluating the validity of the experiment itself – The forms of validity that we have discussed so far have been in terms of defining our variables • operational definitions • We also need to determine whether the experiment itself is valid – Internal validity • The degree to which a researcher is able to show a causal relationship between antecedent conditions and subsequent behavior – External Validity • How well the Findings of the Experiment generalize or apply to situations in real life • Experiments tend to be high internal validity and low in external validity – contrived (controlled) situations • rats in cages exposed to cocaine – far from real life human cocaine use. • Keep in mind though that an experiment that is low internal validity (confounded) also has low external validity. – The experiment can tell us nothing because it is confounded. Confounding • Experiments are most concerned with achieving high internal validity. – External validity is less of a concern. • Internal validity – Occurs when we are relatively certain that the changes in the DV were do to the levels of the IV – if there could be other reasons for the changes, then our experiment is not internally valid. • These other reasons (if they exist) are known as confounds Extraneous Variables versus Confounds • The better control we have over an experiment the more likely our internal validity will be good. – But – there are often other things that are changing during an experiment. • If these other things (variables) are outside the experimenters control we call them extraneous variables – Could be anything. • • • • changes in noise changes in temperature equipment failure differences in subject preparation (some more alert than others) • Extraneous variables should always be avoided if possible • but they are not a threat to the internal validity of the experiment unless they vary systematically with the independent variable. – When this occurs it is known as a confound • Thus, a confound is a type of extraneous variable • Extraneous variables are not necessarily confounds • Confounds are the worst form of extraneous variable – If an extraneous variable varies systematically with the independent variable we cannot know if the changes we see in the dependent variable are due to the IV or the confounding variable. • we cannot infer causality • thus, low internal validity • Extraneous variables alone are not necessarily confounds and won’t necessarily be a threat to the internal validity of an experiment – Alzheimer’s rat experiment • Morris water maze – – – – old rats young rats Drug no drug – What is the IV, DV, Subject Variable? • Jack hammering going on in adjacent building – All rats experienced the jack hammering • Extraneous or confound? • Keep in mind extraneous variables are not good even if they are not confounding. – Makes it more likely you will make a Type II error • Type II error – Not finding an effect that is actually there – failing to reject the Null hypothesis when it is actually false – Because extraneous variables increase the variability of responding (behavior) it makes it more difficult to find significant effects • thus, it lowers our power. • Extraneous variables alone, however, do not cause Type 1 errors – Finding an effect that is not really there – Rejecting the Null hypothesis when it is true • Confounds can lead to Type 1 errors • It would have been a confound if I had only ran the control rats during the jack hammering, and the drug exposed rats when it was quiet. • I didn’t do that, but the jack hammering could have increased the variability in responding to make it more difficult to find a difference – If I find a difference despite this extraneous variable I would not be concerned about the internal validity of the study. – I found an effect under poor conditions. 8 classic threats to Internal Validity • 8 classic threats to Internal Validity – Donald Campbell identified 8 kinds of extraneous variables that can threaten the internal validity of experiments • Each of these is a potential confound • 1) History (External Events) – Refers to the history of the experiment • Changes in the subjects environment that might affect the results – What if during testing of a new anxiety treatment • The 9/11 attacks take place – What if while testing treatments for asthma • summer smog occurs in the city – 1970s when head start began • Sesame Street also began – proper controls can eliminate this issue • 2) Maturation – refers to any internal (physical or psychological) changes in subjects that could affect the dependent measure • Performance could get worse – boredom – fatigue • Performance could get better – Within subject designs always have to worry about this • Performance at the end of a session could be better or worse because of the above reasons • Make sure your IV conditions are evenly distributed across time – A classic issue in medical research is spontaneous recovery • people sometimes just get better over time. • 3) Testing – This refers to effects on the DV produced because of previous administration of the same test. • single group pretest/posttest design – Might think about initial test, figure things out they wish they had done differently. • GRE/Kaplan/GRE – increase in performance could be the result of taking the test previously – Practice effect • 4) Instrumentation – When some feature of the measuring instrument itself changes over the course of an experiment – Obviously if your equipment breaks that would be a problem – What if using human assessment? • train someone to observe behavior – aggression • They could get better at observing aggression through practice – If one group is assessed early in the semester and one late in the semester could be a problem (instrumentation confound) • 5) statistical regression – This can occur whenever people are assigned to conditions based on extreme scores on a test – Regression toward the mean • Draw the normal curve – – – – – IQ – look at the smartest people make some manipulation Test them again likely to score more toward average The scores will tend to decrease or regress to the mean regardless of manipulation – Tony Delk nine 3-pointers – An early study of Sesame Street indicated that it was especially helpful to the most disadvantaged kids • Kids with the lowest scores improved the most • 6) Subject Mortality – If more subjects drop out of one condition compared to another – Can be a clear problem in animal studies • high dose of cocaine – 50% mortality – dealing with a “super” subgroup of rats – in human studies • dropout of experimental condition because aversive or boring – those that stay are a special subgroup of people – not the same as the control group – Children in head start that are having trouble might drop out • leaving only the better students • thus, higher scores • 7) selection – Whenever the researcher does not randomly assign to the different conditions of the experiment • When we randomly assign to groups we spread individual differences among our subjects across the conditions • without random assignment individual differences could affect the results of our study. – Self selection bias • when participants choose which group they will be in – perhaps parents choose to place their kids in head start because they are more motivated for their kids to learn • 8) selection interactions – • Selection & History – • Perhaps observers in control group get bored and less likely to pick up on kids improvement Selection & Mortality – • Head start kids are more motivated, so they figure out trick questions better leading to improved performance on the second test Selection & Instrumentation – • Head start kids are more motivated and thus more likely to develop skills faster than the control condition Selection & Testing – • Head start kids are more motivated (parents chose program) and more likely to watch Sesame street when it came on Selection & Maturation – • A selection threat can combine with another threat to form a selection interaction Perhaps you lose more motivated (good) students from the control group to other preschool programs Selection & Regression – If kids that are the furthest behind are assigned more often to the treatment group – they would show greater improvement • Work on CONFOUND WORKSHEETS