Methodological Control Extraneous Variables Extraneous variables (EVs) ~ unintended factors that could cause the variations in the DV Confound ~ an uncontrolled extraneous variable that covaries with the IV and thus could provide an alternative explanation to the effect on DV • Extraneous variables • Confounds Variables that cause variations in the DV Variables that covary with the IV Variables that do not covary with the IV Variables that do not cause variations in the DV Confounds the usual way of manipulating IV is to vary it between two or more levels and these levels are called experimental conditions Experimental Condition 1 Versus Do something one way Experimental Condition 2 Do something another way these two experimental conditions should be identical apart from the difference in the IV If they are not, you have confound(s) Control Condition conceptually means “do nothing”, in contrast to the other experimental condition(s) which means “do something” this distinction is only meaningful when the manipulation is something (e.g., a treatment) that can be present or absent Control Condition Experimental Condition Versus Do nothing Do something • • • Drug Treatment Manipulation Manipulating Independent variables Measuring Dependent variable Extraneous variables Controlling Controlling Extraneous Variables in well-designed experiments, a significant effect in DV is unambiguously attributed to the manipulation of IV on the other hand, in poorly-designed experiments, there are ambiguities in the source of effect in DV because there are EVs involved. In other words, there could be uncontrolled EVs (i.e., confounds) the usual challenge is that there are lots of EVs in addition to the IV Controlling Extraneous Variables we start by looking how the (variations of) IVs and EVs contribute to the (variations of) DVs Suppose we want to study the impact of cellphone use on driving. In addition to the IV (cellphone vs. no cellphone), we have other EVs that will affect the DV (driving performance) Skills Traffic Car Fatigue the intended comparison should be done by examining the differences between the 2 distributions Count Intended comparison Score the different levels of EVs will contribute to both distributions both good and bad drivers could use cellphone (or not), so they will add more irrelevant variations to the distributions Count Skills Score the different levels of EVs will contribute to both distributions in both heavy or light traffic people could use cellphone (or not), so they will add more irrelevant variations to the distributions Count Traffic Score the different levels of EVs will contribute to both distributions in both fancy and old cars people could use cellphone (or not), so they will add more irrelevant variations to the distributions Count Cars Score the different levels of EVs will contribute to both distributions both energetic and tired drivers could use cellphone (or not), so they will add more irrelevant variations to the distributions Count Fatigue Score IV= Level 1 EV1= Level 1 DV (IV= Level 1) EV2= Level 1 EV2= Level 2 DV (IV= Level 2) EV1= Level 2 IV= Level 2 Controlling Extraneous Variables EVs will add more irrelevant variations to the DVs and make the data more noisy (i.e., statistically more demanding) even if they do not covary with the IVs of course, if EVs covary with IVs (e.g., those who drive fancy cars are more likely to use cellphones), they become confounds and present a serious threat to the internal validity of the experiment for both reasons, we want to reduce the influences of the EVs Control Variable a first way to reduce the influences of an EV is to fix this variable throughout the experiment and make it a control variable if a variable cannot vary, then it cannot add variability to a DV and will cause no trouble in the above example, if we worry about the types of cars, we can fix it as a control variable (i.e., all drive the same type of car). If we worry about the skills, we can choose to use all subjects of the same level (e.g., all are beginners) Control Variable indeed, it is a common practice to standardize the experiments as a computer program ~ as a result, many factors (e.g., instruction, interpretation, feedback, etc.) can be kept precisely constant and the random variations caused by the experimenter can be minimized there are some variables that cannot be easily controlled (e.g., fluctuations of the mood and alertness of subjects) there are also some variables we do not want to control as fixed, often to maintain the generality of the study (e.g., both males and females should usually be included) Control Variable To study reading letters in different colors the left panel controls the distance to center to reduce the noise the right panel does not and keeps the experiment more general K B C R Y FSE ZDAHMVUH Control Variable if, for whatever reason, we cannot (or do not want to) control an EV as fixed, then we should make a judgment on whether this EV could potentially covary with the IV (i.e., a confound) if the EV is not a confound, we could collect more data to increase the statistical power, so to handle the extra noise caused by the EV Control Variable There is not a fixed set of rules you can follow to be completely precise on the judgment. Even the most experienced scientist are sometimes unsure. But the following points are important be familiar with the common potential confounds make good intuitive judgments the experts in the relevant field are usually the best guide When you are unsure, it is better to be conservative than liberal Matching the EVs if an EV could be a confound, then we need to avoid this confound by matching the EV e.g., in visual perception, we often adjust the sizes of items by their distances to the center to match their perceptibility Matching the EVs the general rule of matching is to make the EVs as equal as possible across conditions so that they do not covary with the IV again, there is not a fixed set of rules you can follow to achieve perfect matching. To know the optimal way of matching requires expertise in specific fields in the above example, matching of “perceptibility” is accompanied by mismatching of “sizes”, which we know do not matter much. But this would not be obvious to a layperson Should we (and can we) keep an EV constant (i.e., control variable)? Yes No Keep it constant (i.e., control variable) No Could it be a confound? Collect more data to handle the extra noise Yes Avoid the covariance by matching, or in another way Class Exercise 1 For each of the following findings (or claims), please list the important EV(s) that needs to be controlled with reasons 1. Females rely more on peers’ opinions when choosing partners 2. Habitual video-game players can focus their attention more effectively 3. Old people are happier BetweenBetween-Subjects Design each participant receives only one level of each IV levels are compared by comparing participants here, the different conditions are tried on different group of subjects, so the experimental condition and control condition respectively become experimental group and control group BetweenBetween-Subjects Design What is the effect of some pills on depression? Pills Experimental group Depressed Sample Control group Placebo WithinWithin-Subjects Design each participant goes through all levels of each IV levels are compared within each participant it should be mentioned that the within-subjects design and between-subjects design correspond respectively to the paired t-test (or repeated measures ANOVA) and two-sample t-test in statistics, and should be tested by the corresponding method WithinWithin-Subjects Design The performance of searching for a dog in 2 conditions Sample BetweenBetween-Subjects vs. WithinWithin-Subjects Within-subjects design subjects are matched so no need to worry about subject variable as confound variations from subjects can be avoided, thus it has greater statistical power Between-subjects design sometimes it is the only option (i.e., subject variable) no need to worry about sequence effect BetweenBetween-Subjects Design 850ms 1700ms 600ms 1000ms 1450ms …… 800ms …… There are a lot of variability caused by the subject differences There is nothing we can do about this Is there a significant difference? Probably not WithinWithin-Subjects Design 850ms 1000ms 150ms 600ms 800ms 200ms 1450ms …… 1700ms …… 250ms There are a lot of variability caused by the subject differences This problem can be fixed by calculating the difference for each subject Difference 0 Is this difference greater than 0? Certainly yes! This is obviously a very consistent effect in terms of a withinsubjects design But if the pairing relations are removed (i.e., between-subjects design). They will appear to be 2 overlapping groups with little systematic differences Control condition Experimental condition Sequence Effects Comparing the processes involved in solving 2 types of problems Chinese chess (1 hour) Chess (1 hour) Can we really draw a conclusion? ?????? ?????? Trick1, Trick2… Sequence Effects Comparing the psychological benefits of swimming and biking Swimming for 1 hour Can we really draw a conclusion? biking for 1 hour Sequence Effects Comparing the fear induced by a spider picture and a spider crawling upon you see picture (10 min) crawling (10 min) Can we really draw a conclusion? Sequence Effects Learning (practice) effects Fatigue Adaptation To handle the sequence effects, we need to balance the orders of different conditions in different groups of subjects Counterbalancing Complete Counterbalancing 2 conditions 3 conditions 4 conditions AB BA ABC BCA CAB ACB BAC CBA ABCD ABDC ACBD ACDB ADBC ADCB BACD BADC BCAD BCDA BDCA BDAC CABD CADB CBAD CBDA CDBA CDAB DABC DACB DBCA DBAC DCAB DCBA Complete Counterbalancing # of conditions # of sequences 2 2 3 6 4 24 5 120 6 720 7 5040 Number of sequences for 8 conditions, 9 conditions, 10 conditions? Formula? e.g., 3! = 3 x 2 x 1 = 6 4! = 4 x 3 x 2 x 1 = 24 Counterbalancing dividing the subjects into groups does not mean this is a between-subjects design but you should still try to make the groups approximately equivalent ABC ACB BCA Subjects Random assignment BAC CAB CBA Partial Counterbalancing when there are six or more conditions, it is difficult to find enough subjects to cover all these sequences one alternative strategy is partial counterbalancing: a random sample of all possible sequences e.g., if there are 140 subjects for a 7-condition experiment, then just take a random sample of 140 sequences out of all 5040 possible sequences Latin Square Another alternative strategy is Latin square, a matrix that is designed so that: Every condition of the study occurs once in every sequential positions Every condition precedes and follows every other condition exactly once A B C D E F B C D E F A F A B C D E C D E F A B E F A B C D D E F A B C Long Sequence So far the discussion of sequence effect is typically applied to the situation in which each condition appears only once. What if each condition appears more than once? Indeed, in some areas of psychology, a subject could be tested for thousands of trials and each condition will appear many times Here, the critical issue is a bit different Long Sequence Sometimes, we group all the trials of the same condition into one block, and arrange the blocks according to the rules discussed above: AAAAAA……AAAAABBBBBB……BBBBB BBBBBB……BBBBBAAAAAA……AAAAA However, even if the sequence effect is balanced across subjects, it is usually a better idea to intermix these conditions more thoroughly so the sequence effect can be controlled more completely Randomization In such long sequence, a more typical strategy is to intermix the trials. What is important in this long sequence is to approximately balance the positions of different conditions. We can be less strict on the detailed arrangements of a few trails The most convenient strategy is randomization ~ trials are randomly added to this sequence Below, you can verify that the 4 conditions are approximately evenly distributed because of randomization ACBAABAADDCAACBABDBBBCCBAADADAAABCBDC CCDDCCCDDCCDADACCADADDBBCA The randomization works well in a very long sequence (e.g., >500). But it could be risky in a shorter sequence (e.g., <100). There are alternative strategies: Reverse counterbalancing ~ all conditions appear in a fixed order and then is reversed ABCDE EDCBA ABCDE EDCBA ABCDE EDCBA Block randomization ~ each condition appears once in each block and the order of condition is randomized within each block CBEDA EADCB BEACD AEDCB EBACD A few conditions Complete counterbalancing Short sequence Many conditions Partial counterbalancing or Latin square Long sequence Reverse counterbalancing or Block randomization Very long sequence Randomization Sometimes, the sequence effect is too large and therefore impossible to balance e.g., Two ways of teaching children a foreign language Two treatments of a disease after learning the language (or recovering from the disease), it makes no sense to start over on another learning session (or another treatment) In this situation, we should use between-subjects design Creating Equivalent Groups for various reasons (e.g., studying a subject variable), we need to use between-subjects design an important concern here is whether the difference between the groups could be a confound. In other words, we need to make the best effort to create equivalent groups we often have problems with the equality of the groups if we use naturally formed groups rather than assigned groups e.g., a study found that a group of winter-swimmers that regularly swim in the Winters for the last 15 years are in better health condition than normal people; therefore, winter-swimming is good for your health Creating Equivalent Groups This study has an obvious potential confound called “subject-selection effect / bias” ~ people in better health condition perhaps will be more likely to develop and maintain an interest in winter-swimming for 15 years the way to avoid such problem is assigning people into groups rather than comparing between naturally formed groups e.g., if you found 100 subjects and randomly select 50 of them to perform winter-swimming, the confound can be reasonably avoided Random Assignment Random assignment is a convenient strategy that usually works fairly well the subjects are randomly assigned into different groups (i.e., conditions). Even if there are significant differences between individual subjects, the average of the groups are expected to be equal Experimental group Subjects Random assignment Control group Experimental group Control group Random Assignment However, random assignments sometimes can have problems First, when there are only a few subjects, random assignment could introduce systematic differences (i.e., gender below) Experimental group Control group Random Assignment Second, when the subjects are from a very heterogeneous group (i.e., age ranging from 5 to 70) rather than a homogenous group (e.g., college students), even a small imbalance could be a problem Third, when IV is a subject variable, there are often some EVs that are inherently related to it e.g., a behavior difference between normal subjects and neglect patients may be just a difference caused by general cognitive capability e.g., man and woman statistically differ on heights Subject Matching In these situations, we often use a deliberate subject matching strategy e.g., if we only have a few subjects in each condition and we believe gender may be an important factor for this research question, we want to match the gender of subjects between groups Experimental group Control group Subject Matching In the case of subject variable, sometimes the subject matching requires the use of an appropriate control group e.g., we want to study the cognitive effect of basketball playing and have chosen a group of basketball players. To make sure the results are not just due to general physical condition, the basketball players should be matched with other athletes Experimental group Control group Subject Matching Similarly, when studying subject variable, matching of an EV may require using the same “range” from both groups so EVs that are naturally unequal can be equated e.g., if you want to exclude the influences of height when studying gender difference, you need to match the heights of subjects Creating Equivalent Groups even in the above-mentioned situations (i.e., small sample size, heterogeneous group, subject variable), it is generally both impractical and unnecessary to control for all possible EVs we only control for an EV when there are reasons to believe this EV will significantly affect the DV we don’t have a fixed set of principles that can allow you to be 100% sure on this judgment, but good intuition and expertise are useful. We can always be caught by surprise Class Exercise 2 Which EV matters for which DV? Why? Memory capacity Mental calculation Religion Attitude toward divorce Subject Matching Sometimes, it is an useful approach to match between couples or siblings (or twins) because they allow close match of many factors. Use of this strategy depends on the specific research question Experimental group Control group Random Assignment vs. Subject Matching Random assignment convenient usually good enough for homogenous groups (e.g., university undergraduates) Subject matching studying subject variables (e.g., patient vs. normal subjects, race, gender, etc) studying a very heterogeneous group studying a very small sample Experimenter Bias One type of experimenter bias is this: unintentionally or even unconsciously, the experimenter may behave differently when they see desired and undesired responses (e.g., smile vs. frown) Oh,Here No. that you is the go! opposite That’sof my right! prediction… Wait, actually A better than Iisthink B is B. better than A. Experimenter Bias another type of experimenter bias is that when they have to rate the responses of the subjects (e.g., infants) and they know the “answers”, they will be biased The stimuli is on this side, the baby must be looking there? the way to avoid this bias is to make the experimenter blind to the conditions. For example, here, she will rate the baby’s response without seeing the stimuli Experimenter Bias To control for the experimenter bias ~ reduce direct involvement of experimenters ~ standardize the process Standard written instructions Automated experiments Experimenters have to rate without knowing the “answers” (i.e., rate blindly on video tape) Subject Bias one type of subject bias is that they may try to guess the hypothesis and predictions of an experimenter and, intentionally or unintentionally, try to be a good subject and conform to the predicted results e.g., if the subjects assume you are studying the cognitive effect of drinking coffee, they may behave according to their interpretation I have had one cup of coffee, so now I should perform this task well Subject Bias another type of subject bias is the evaluation apprehension subjects want to be evaluated positively, so they may behave as they think the ideal person should behave I guess the answer is “yes”. But if I choose “yes” I will look selfish, then I have to say “no” Subject Bias To control for the subject bias, it is important to make sure the subjects cannot easily figure out the real purpose of the experiment ask the question indirectly and use implicit measurements use subjects that are naive for the purpose of the experiment make the different conditions similar (e.g., using placebo treatment on the control group) Placebo an important issue concerning the equality of different groups is that subjects in different groups should be unaware of the groups they are in otherwise, the groups would differ on that they know they are in different conditions so they may have different expectations and behave differently this issue is very important when studying the effectiveness of a treatment / drug subjects may get better simply because they know they are treated. To control for this, we usually give control group a “placebo” so that they also think they are treated Placebo Effect 1. Regular Coffee Difference= Caffeine effect 2. Coffee that tastes (and looks) like water 3. Water that tastes (and looks) like coffee Difference= Caffeine effect 4. Regular water Placebo Effect 1. Regular Coffee Difference= Placebo effect Difference= Placebo effect 2. Coffee that tastes (and looks) like water 3. Water that tastes (and looks) like coffee 4. Regular water Placebo effect If only conditions 1 & 4 are included, then the placebo is a potential confound a complete understanding of the actual effect (e.g., caffeine effect) and the placebo effect needs all conditions 1~4 practically, if one only wants to control the placebo effect and do not want to study the placebo effect itself, then having condition 1 & 3 (or condition 2 & 4) is sufficient the placebo effect can be very strong. Sometimes, 70% of patients are significantly improved after placebo treatment SingleSingle- and DoubleDouble-blind Experiment an experiment that has carefully controlled the subject bias is called “single-blind experiment” because the subjects are not aware of the real purpose. It is usually sufficient for automated experiment If we keep both subjects and experimenters unaware of experimental conditions, we can control for both experimenter bias and subject bias simultaneously. Such an experiment is called “double-blind experiment”. This is required if there needs to be a lot of direct interactions between subjects and experimenters PrePre-test /Post/Post-test Design Many important types of studies involve multiple sessions e.g., to study the effectiveness of a training/ treatment/ drug, it is typical to measure the DV before and after the treatment (i.e., respectively pre-test and post-test) and the difference between the 2 tests reflect the effectiveness of the treatment Pre-test Treatment Post-test Difference Effectiveness of the treatment Maturation The pre-test and post-test sessions could differ on many aspects. First, there could be some general maturation of the subjects For example, there is a 1-semester program that is designed to help college freshman to get used to campus life. Suppose it is found that the students are more used to the campus life after this program. It does not necessarily mean this program is effective, it may simply mean that the students are generally more experienced after 1-semester History Effect Second, there could be some general change in the period (i.e., history effect) For example, a treatment program on depression takes about 3 months (Jan 15 to Apr, 15) and it is found that the patients are relieved after it. It does not necessarily mean this treatment is effective, it may be simply that these patients get better in the spring and get worse in the winter Regression to the Mean Third, the pre-test vs. post-test method also tends to have a subjects selection problem called regression to the mean: for those who have got extreme pre-test scores, their post-test scores tend to move toward the mean and cause a difference between pre-test and post-test scores Imagine there is a karaoke competition in this class, a student that ranks 2nd out of 90 will tend to ______ in a second competition A. Still rank 2nd B. Rank higher than 2nd C. Rank lower than 2nd Regression to the Mean the reason of “regression to mean” is that the extreme scores in one test are partly caused by random noises on that direction. Therefore, another test with independent random noises will push these scores “back to mean” naturally, the tendency of regression to mean depends on the magnitude of the random noise in the specific test Imagine the karaoke is now replaced by 1 of the 2 following types of competitions Throwing coins for 100 times Competition on body heights For throwing coins, a new score will almost return completely to mean because these scores are 100% random noise 35 42 45 48 53 57 69 73 8 7 6 5 4 3 2 1 2 66 3 59 7 44 5 47 8 40 1 69 4 55 6 46 For body heights, a new score will hardly return to mean because these scores have almost 0% noise 9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 Regression to the Mean This “regression to the mean” problem usually cannot be easily solved by assigning subjects After all, this pre-test/ post-test design is often specifically designed for subjects that are extreme scores on a previous test (e.g., patients that have been diagnosed as positive on some disease) PrePre-test Effect Fourth, sometimes the mere fact of taking a pre-test has an effect on the results of the post-test (i.e., pre-test effect) One may think these effects (history, maturation, pre-test effect, regression to the mean) can be controlled by counterbalancing Unfortunately, these effects usually cannot be controlled by counterbalancing because, by definition, we can not switch the orders of the pre-test and post-test Waiting List Control Group to control for these confounds on the “pre-test / post-test difference” (history, maturation, pre-test effect, regression to the mean), we will add a control group that also participates in the pre-test and post-test but receives no treatment (or receive a placebo treatment) It is important to remember here we should use the subjects from the same pool as the control group (i.e., waiting list control group) to make sure the experimental and control groups are equivalent Experimental group Control group Pre-test Pre-test Treatment Placebo Post-test Post-test Difference Difference Difference Effectiveness of the treatment Cohort Effects A study in 2011 found that 25-year olds are much better than 75-year olds on computer typing, but are not better on handwriting. Therefore, computer typing, but not handwriting, degenerate a lot with aging This conclusion is flawed because it fails to consider the cohort effects: people of different cohorts (i.e., generations) could be systematically different. people born in 1936 have no exposure to computer until very late in their lives, whereas people born in 1986 grow up with computers 1936 1961 1986 2011 2036 2061 ? CrossCross-sectional and Longitudinal studies this type of research that compares between different age-groups is called cross-sectional study and one should always carefully consider the cohort effects an alternative approach is longitudinal study e.g., measure the typing and handwriting skills of this 1986-born group after 50 years (year 2066) longitudinal studies can rule out cohort effects, but they have their own problem Attrition Problem Longitudinal studies are time-consuming and logistically challenging More importantly, you can imagine a large portion of subjects will drop out of such a study so the group that eventually completes this study could be systematically different from the group that the study started with (i.e., attrition) To help on this attrition problem, we can see whether those who stay in the study differ from those who do not, at the beginning of the study. If not, then attrition is less likely to be a problem Cohort Sequential Design Cohort sequential design is a strategy that combines the two above approaches In such a study, a group of subjects will be selected and retested every several years, and then additional cohorts will be added every several years The attrition is less a problem here than longitudinal studies, and it still has a good control of cohort effects Cohort /Birth 1 / 1990 2 / 1995 3 / 2000 2005 15 Year of study 2010 2015 2020 20 25 15 20 25 15 20 2025 25 Group Discussion 1 In United States today, many healers believe the conventional wisdom that a distillation of fluids extracted from the urine of horses, if dried to a powder and fed to aging women, could preserve youth, and heal a variety of diseases. This method has been very popular and is still believed by many today The main evidence for its effectiveness is this: They have measured the health conditions of regular users and found them to be better than a group matched on age, gender, income, etc However, a recent study by scientists found the opposite results: this therapy has no obvious benefits and has caused significantly more frequent breast cancers and strokes So what was wrong with the original evidence?