Experiments and Observational Studies Chapter 13 Objectives: • • • • • • • • • Observational study Retrospective study Prospective study Experiment Experimental units treatment response Factor Level • Principles of experimental design • Statistically significant • Control group • Blinding • Placebo • Blocking • Matching • Confounding Observational Study • Observes individuals and records variables of interest but does not attempt to influence the response (does not impose a treatment). – Allows the researcher to directly observe the behavior of interest rather than rely on the subject’s self-descriptions (survey). – Allows the researcher to study the subject in its natural environment, thus removing the potentially biased effect of the unnatural laboratory setting on the subject’s performance (animal behavior). Observational Study • Two Types 1. Field Observation – Observations are made in a particular natural setting over an extended period of time. 2. Systematic Observation – Observations of one or more particular behaviors in a specific setting. • Since Observational Studies do not impose a treatment it is not possible to prove a cause-and-effect relationship with an observational study. Observational Study • Example: – Researchers compared the scholastic performance of music students with that of non-music students. The music students had a much higher overall grade point average than the non-music students, 3.59 to 2.91. Also, 16% of the music students had all A’s compared with only 5% of the non-music students. Observational Study • In an observational study, researchers don’t assign choices; they simply observe them. – The example looked at the relationship between music education and grades. – Since the researchers did not assign students to get music education and simply observed students “in the wild,” it was an observational study. – Because researchers in the example first identified subjects who studied music and then collected data on their past grades, this was a retrospective study. Retrospective Study • Observational studies that try to discover variables related to rare outcomes, such as specific diseases, are often retrospective. They first identify people with the disease and then look into their history and heritage in search of things that may be related to their condition. • Retrospective studies have a restricted view of the world because they are usually restricted to a small part of the entire population. • Because retrospective studies are based on historical data, they can have errors. – Do you recall exactly what you ate yesterday? How about last Monday? Prospective Study • A somewhat better approach to a observational study, then using historical data such as in a retrospective study, is to identify subjects in advance and collect data as events unfold. This called a prospective study. • In our example studying the relationship between music education and grades, had the researchers identified subjects in advance and collected data over an entire school year or years, the study would have been a prospective study. Observational Study • Observational studies are valuable for discovering trends and possible relationships. • However, it is not possible for observational studies, whether prospective or retrospective, to demonstrate a cause and effect relationship. There are too many lurking variables that may affect the relationship. Experiment • Definition: Experiment – deliberately imposes some treatment on individuals in order to observe their responses. • Basic Experimental Design – Subject Treatment Observation • The purpose of an experiment is to reveal the response of one variable to changes in other variables, the distinction between explanatory and response variables is essential. Experiment • An experiment is a study design that allows us to prove a cause-and-effect relationship. • In an experiment, the experimenter must identify at least one explanatory variable, called a factor, to manipulate and at least one response variable to measure. • An experiment: – Manipulates factor levels to create treatments. – Randomly assigns subjects to these treatment levels. – Compares the responses of the subject groups across treatment levels. Experiment • In an experiment, the experimenter actively and deliberately manipulates the factors to control the details of the possible treatments, and assigns the subjects to those treatments at random. • The experimenter then observes the response variable and compares responses for different groups of subjects who have been treated differently. Experiment • In general, the individuals on whom or which we experiment are called experimental units. – When humans are involved, they are commonly called subjects or participants. • The specific values that the experimenter chooses for a factor are called the levels of the factor. • A treatment is a combination of specific levels from all the factors that an experimental unit receives. Review - Experimental Terminology • Experimental Units – The individuals or items on which the experiment is performed. – When the experimental units are human beings, the term subject is often used in place of experimental unit. • Response variable – The characteristic of the experimental outcome that is being measured or observed. Review - Experimental Terminology • Factor – The explanatory variables in an experiment. – A variable whose effect on the response variable is of interest in the experiment. • Levels – The different possible values of a factor. Review - Experimental Terminology • Treatment – A specific experimental condition applied to the units of an experiment. – For one-factor experiments, the treatments are the levels of the single factor. – For multifactor experiments, each treatment is a combination of the levels of the factors. Example: • Researchers studying the absorption of a drug into the bloodstream inject the drug into 25 people. 30 minutes after the injection they measure the concentration of the drug in each person’s blood. • Identify the; a) b) c) d) e) Experimental units. Response variable. Factors. Levels of each factor. Treatments. Answer: Researchers studying the absorption of a drug into the bloodstream inject the drug into 25 people. 30 minutes after the injection they measure the concentration of the drug in each person’s blood. a) Experimental units – b) Response variable – c) Single factor – the drug Levels – e) Concentration of the drug in the blood Factors – d) Subjects, the 25 people injected One level – the dose Treatment – Injecting the drug Your Turn: • Weight gain of Golden Torch Cacti. Researchers examined the effects of a hydrophilic polymer and irrigation regime on weight gain. For this study the researchers chose the hydrophilic polymer P4. P4 was either used or not used, and five irrigation regimes were employed: none, light, medium, heavy, and very heavy. • Identify the; a) b) c) d) e) Experimental units. Response variable. Factors. Levels of each factor. Treatments. Answer: Weight gain of Golden Torch Cacti. Researchers examined the effects of a hydrophilic polymer and irrigation regime on weight gain. For this study the researchers chose the hydrophilic polymer P4. P4 was either used or not used, and five irrigation regimes were employed: none, light, medium, heavy, and very heavy. a) Experimental units – b) The cacti used in the study Response variable – c) The weight gain of the cacti Factors – d) Two factors – the hydrophilic polymer P4 and the irrigation regime Levels – – e) P4 has two levels; with and without. Irrigation regime has five levels; none, light, medium, heavy, and very heavy. Treatment – There are 10 different treatments, each a combination of a level of P4 and a level of irrigation regime. See next slide for treatments. Schematic for the 10 Treatments in the Cactus Study Factors Levels Treatments Randomized, Comparative Experiment 1. Manipulates the factor levels to create treatments. 2. Randomly assigns subjects to these treatments. 3. Compares the responses of the subject groups across treatment levels. The Four Principles of Experimental Design 1. Control 2. Randomize 3. Replicate 4. Block The Four Principles of Experimental Design 1. Control: – – – Good experimental design reduces variability by controlling the sources of variation. We control sources of variation other than the factors we are testing by making conditions as similar as possible for all treatment groups. Comparison is an important form of control. Every experiment must have at least two groups so the effect of a treatment can be compared with either the effect of a traditional treatment or the effect of no treatment at all. The Four Principles of Experimental Design 2. Randomize: – – Subjects should be randomly divided into groups to avoid unintentional selection bias in constituting the groups, that is, to make the groups as similar as possible. Randomization allows us to equalize the effects of unknown or uncontrollable sources of variation. • – It does not eliminate the effects of these sources, but it spreads them out across the treatment levels so that we can see past them. Without randomization, you do not have a valid experiment and will not be able to use the powerful methods of Statistics to draw conclusions from your study. The Four Principles of Experimental Design 2. Randomize: – One source of variation is confounding variables (will discuss later), variables that we did not think to measure but which can affect the response variable. – Randomization to treatment groups reduces bias by equalizing the effects of confounding variables. The Four Principles of Experimental Design 3. Replicate: – Repeat the experiment, applying the treatments to a number of subjects. • • • One or two subjects does not constitute an experiment. The outcome of an experiment on a single subject is an anecdote, not data. A sufficient number of subjects should be used to ensure that randomization creates groups that resemble each other closely and to increase the chances of detecting differences among the treatments when such differences actually exist. Example: Replication The outcome of an experiment on a single subject is an anecdote, not data. The Four Principles of Experimental Design 3. Replicate: – When the experimental group is not a representative sample of the population of interest, we might want to replicate an entire experiment for different groups, in different situations, etc. • Replication of an entire experiment with the controlled sources of variation at different levels is an essential step in science. – The experiment should be designed in such a way that other researchers can replicate the results. The Four Principles of Experimental Design 4. Block: – – – Sometimes, attributes of the experimental units that we are not studying and that we can’t control may nevertheless affect the outcomes of an experiment. If we group similar individuals together and then randomize within each of these blocks, we can remove much of the variability due to the difference among the blocks. Note: Blocking is an important compromise between randomization and control, but, unlike the first three principles, is not required in an experimental design. Diagrams of Experiments • It’s often helpful to diagram the procedure of an experiment. • The following diagram emphasizes the random allocation of subjects to treatment groups, the separate treatments applied to these groups, and the ultimate comparison of results: Flow Chart Logic of Experimental Design 1. Randomization produces groups of experimental units that should be similar in all respects before the treatments are applied. 2. Comparative design ensures that influences other than the experimental treatments operate equally on all groups. 3. Therefore, differences in the response variable must be due to the effects of the treatments. That is, the treatments not only are associated with the observed differences in the response but must actually cause them (cause and effect). Does the Difference Make a Difference? • How large do the differences need to be to say that there is a difference in the treatments? • Differences that are larger than we’d get just from the randomization alone are called statistically significant. • We’ll talk more about statistical significance later on. For now, the important point is that a difference is statistically significant if we don’t believe that it’s likely to have occurred only by chance. Experiments and Samples • Both experiments and sample surveys use randomization to get unbiased data. • But they do so in different ways and for different purposes: – Sample surveys try to estimate population parameters, so the sample needs to be as representative of the population as possible. – Experiments try to assess the effects of treatments, and experimental units are not always drawn randomly from a population. Control Treatments • Often, we want to compare a situation involving a specific treatment to the status quo situation. • A baseline (“business as usual”) measurement is called a control treatment, and the experimental units to whom it is applied is called the control group. Blinding • When we know what treatment was assigned, it’s difficult not to let that knowledge influence our assessment of the response, even when we try to be careful. • In order to avoid the bias that might result from knowing what treatment was assigned, we use blinding. • There are two main classes of individuals who can affect the outcome of the experiment: – those who could influence the results (subjects, treatment administrators, technicians) – those who evaluate the results (judges, treating physicians, etc.) Blinding • When all individuals in either one of these classes are blinded, an experiment is said to be single-blind. – Single-Blind: An experiment is said to be single blind if the subjects of the experiment do not know which treatment group they have been assigned to or those who evaluate the results of the experiment do not know how subjects have been allocated to treatment groups. • When everyone in both classes is blinded, the experiment is called double-blind. – Double-Blind: An experiment is said to be double-blind if neither the subject nor the evaluators know how the subjects have been allocated to treatment groups. Placebos • Often simply applying any treatment can induce an improvement. • To separate out the effects of the treatment of interest, we can use a control treatment that mimics the treatment itself. • A “fake” treatment that looks just like the treatment being tested is called a placebo. – Placebos are the best way to blind subjects from knowing whether they are receiving the treatment or not. Placebos • The placebo effect occurs when taking the sham treatment results in a change in the response variable. – This highlights both the importance of effective blinding and the importance of comparing treatments with a control. • Placebo controls are so effective that you should use them as an essential tool for blinding whenever possible. Designing an Experiment Step-By-Step Completely Randomized Experiment (the ideal simple design) • Goal – State what you want to know. • Response – Specify the response variable. • Treatments – Specify the factor levels and the treatments. Designing an Experiment Step-By-Step • Experimental units – Specify the experimental units. • Experimental Design – Observe the 4 principles of experimental design: • Control – any sources of variability you know of and can control. • Randomly – assign experimental units to treatments, to equalize the effects of unknown or uncontrollable sources of variation. Specify how the random numbers needed for randomization will be obtained. • Replicate – results by placing sufficient experimental units in each treatment group. • Blocking – if required, group similar individuals together. Designing an Experiment Step-By-Step • Specify any other experiment details – Give enough details so that another experimenter could exactly replicate your experiment. – How to measure the response. Randomized Comparative Experiment Example: • Researchers believe that diuretics may be as effective in reducing a person’s blood pressure as the conventional drug (drug A), which is much more expensive and has more unwanted side effects. Design a randomized comparative experiment to test this hypothesis. Randomized Comparative Experiment Example: • Explanatory Variable – Type of Medication Diuretic – Treatments Drug A • Response Variable – Change in Blood Pressure Randomized Comparative Experiment Example: Randomized Comparative Experiment Your Turn: • Can chest pain be relieved by drilling holes in the heart? Since 1980, surgeons have been using a laser procedure to drill holes in the heart. Many patients report a lasting and dramatic decease in chest pain. Is the relief due to the procedure or is it a placebo effect? • Design a randomized comparative experiment, using a group of 298 volunteers with severe chest pain, to test this procedures effectiveness. Randomized Comparative Experiment Example The Best Experiments… • are usually: – randomized. – comparative. – double-blind. – placebo-controlled. Other Experimental Designs 1. Block Design 2. Matched Pairs Design Blocking • When groups of experimental units are similar, it’s often a good idea to gather them together into blocks. • Blocking isolates the variability due to the differences between the blocks so that we can see the differences due to the treatments more clearly. • In effect, we are conducting two parallel experiments. We use blocks to reduce variability so that we can see the effect of the treatments. The blocks themselves are not treatments. • When randomization occurs only within the blocks, we call the design a randomized block design. Blocking • Blocks are another form of control. • Blocking is the same idea for experiments as stratifying is for sampling. – Both methods group together subjects that are similar and randomize within those groups as a way to remove unwanted variation. – We use blocks to reduce variability so we can see the effects of the factors; we’re not usually interested in studying the effects of the blocks themselves. Blocking • Blocking is the same idea for experiments as stratifying is for sampling. – Both methods group together subjects that are similar and randomize within those groups as a way to remove unwanted variation. – We use blocks to reduce variability so we can see the effects of the factors; we’re not usually interested in studying the effects of the blocks themselves. Block Design – Example: • Suppose the researchers in our Diuretics vs. Drug A example have reason to believe that men and women respond differently to blood pressure medication. Then gender would be the blocking variable. • Our goal is to be able to assess a cause-and-effect relationship between the treatment imposed and the response variable. Blocking reduces variability so that the differences we see can be attributed to the treatment that we imposed. Blocking is to experimental design as stratifying is to sampling design. Block Design – Example: Block Design – Your Turn: • The progress of a type of cancer differs in women and men. Design a clinical experiment to compare 3 different therapies for this cancer using a subject pool made up of 80 men and 60 women (140 total subjects). Males #80 Random Allocation Solution: Block By Gender Females #60 Random Allocation Total Subjects 140 Group 1 #20 Treatment #1 Therapy 1 Group 2 #20 Treatment #2 Therapy 2 Group 3 #20 Treatment #3 Therapy 3 Group 4 (control) #20 Treatment #4 Placebo Group 1 #15 Treatment #1 Therapy 1 Group 2 #15 Treatment #2 Therapy 2 Group 3 #15 Group 4 (control) #15 Treatment #3 Therapy 3 Treatment #4 Placebo Compare Cancer Progress Compare Cancer Progress Experimental Design – Example: • An ad for OptiGro plant fertilizer claims that with this product you will grow “juicier, tastier” tomatoes. You’d like to test this clam, and wonder whether you might be able to get by with half the specified dose. • How can you set up an experiment, using 24 tomato plants from a garden store, to check out the claim? Experimental Design – Example: • Completely randomized experiment in one factor (three levels) Experimental Design – Example: • Suppose we wanted to use 18 tomato plants of the same variety for our experiment, but the garden store had only 12 plants left. So we drove down to the nursery and bought 6 more plants of that variety. We worry that the tomato plants from the two stores are different somehow, and, in fact, they don’t really look the same. • How can we design the experiment so that the differences between the stores don’t mess up our attempts to see differences among fertilizer levels? Experimental Design – Example: • Randomized block design (block by store) in 1 factor (3 levels) Adding More Factors • It is often important to include multiple factors in the same experiment in order to examine what happens when the factor levels are applied in different combinations. Experimental Design – Example: • There are two kinds of gardeners. Some water frequently, making sure that the plants are never dry. Others let Mother Nature take her course and leave the watering to her. The makers of OptiGro want to ensure that their product will work under a wide variety of watering conditions. Maybe we should include the amount of watering as part of our experiment. Experimental Design – Example: • Completely randomized two-factors, 3 levels experiment (6 treatments) Matching • In a retrospective or prospective study, subjects are sometimes paired because they are similar in ways not under study. – Matching subjects in this way can reduce variability in much the same way as blocking. Matched Pairs Design • A simple and common special type of block design. • Two types – One Subject or Two Subjects • Conditions – Compare only 2 treatments. – Each block consists of just 2 units, as closely matched as possible (two subjects). – Units are assigned at random to the treatments. – Each block may consist of one subject who gets both treatments one after the other. – Each subject serves as their own control. – The order of the treatments can influence the subject’s response, so the order is randomized for each subject. Matched Pairs Design • One Subject: A common form of matched pairs design uses just one subject who receives both treatments. The order in which the subject receives the treatments is randomized. • Example: 1) Cola taste test – Matched Pairs – Each subject compares two colas (Pepsi/Coke) and picks the one they prefer. – The order in which they taste the colas is randomized. Matched Pairs Design • Example: 2) A researcher believes that students are able to concentrate better while listening to classical music. To test this theory she plans to record the time it takes a student to complete a puzzle maze while listening to classical music and the time it takes him/her to complete another puzzle of the same difficulty level in a quiet room. Because there is so much variability in problem-solving abilities among students, a matched pairs design will be used to reduce this variability so that any difference recorded can be attributed to the conditions under which the student completed the puzzle. Design - Each student will complete a puzzle under each of the conditions. A coin will be flipped to determine whether the task will be done in a quiet room first or while listening to classical music. The difference in the time it takes to complete each puzzle (Quiet-Music) is recorded for each student. Matched Pairs Design • Two Subjects: The two subjects are paired based on common characteristics that might affect the response variable. One subject from each pair is randomly assigned to each of the treatment groups. The response variable is then the difference in the response to the two treatments for each pair. Matched Pairs Design • Example: Marathon runners are matched by weight, physical build, and running times. They are asked to test the design of a new running shoe compared to the manufacturer’s old design for durability through a race. A coin is tossed to determine which runner in each pair will wear the new design. After the marathon the difference in wear pattern for each pair of runners is then measured and recorded. Confounding • An experiment is said to be confounding if we cannot separate the effect of a factor or treatment (explanatory variable) from the effects of other influences (confounding variables) on the response variable. • Example: When the levels of one factor are associated with the levels of another factor, we say that these two factors are confounded. • When we have confounded factors, we cannot separate out the effects of one factor from the effects of the other factor. • In the lab, we try to avoid confounding by rigorously controlling the environment of the experiment so that nothing except the experimental treatment influences the response. Lurking or Confounding • A lurking variable creates an association between two other variables that tempts us to think that one may cause the other. – This can happen in a regression analysis or an observational study. – A lurking variable is usually thought of as a prior cause of both y and x that makes it appear that x may be causing y. Lurking or Confounding • Confounding can arise in experiments when some other variables associated with a factor has an effect on the response variable. – Since the experimenter assigns treatments (at random) to subjects rather than just observing them, a confounding variable can’t be thought of as causing that assignment. • A confounding variable, then, is associated in a noncausal way with a factor and affects the response. – Because of the confounding, we find that we can’t tell whether any effect we see was caused by our factor or by the confounding factor (or by both working together). What Can Go Wrong? • Don’t give up just because you can’t run an experiment. – If we can’t perform an experiment, often an observational study is a good choice. • Beware of confounding. – Use randomization whenever possible to ensure that the factors not in your experiment are not confounded with your treatment levels. – Be alert to confounding that cannot be avoided, and report it along with your results. What Can Go Wrong? • Bad things can happen even to good experiments. – Protect yourself by recording additional information. • Don’t spend your entire budget on the first run. – Try a small pilot experiment before running the full-scale experiment. – You may learn some things that will help you make the fullscale experiment better. What have we learned? • We can recognize sample surveys, observational studies, and randomized comparative experiments. – These methods collect data in different ways and lead us to different conclusions. • We can identify retrospective and prospective observational studies and understand the advantages and disadvantages of each. • Only well-designed experiments can allow us to reach causeand-effect conclusions. – We manipulate levels of treatments to see if the factor we have identified produces changes in our response variable. What have we learned? • We know the principles of experimental design: – Identify and control as many other sources of variability as possible so we can be sure that the variation in the response variable can be attributed to our factor. – Try to equalize the many possible sources of variability that cannot be identified by randomly assigning experimental units to treatments. – Replicate the experiment on as many subjects as possible. – Control the sources of variability we can, and consider blocking to reduce variability from sources we recognize but cannot control. What have we learned? • We’ve learned the value of having a control group and of using blinding and placebo controls. • We can recognize problems posed by confounding variables in experiments and lurking variables in observational studies. Assignment • Ch-13, pg.312 – 316: #5 – 15 odd, 21 – 25 odd, 29, 33, 41 • Read Ch-14, pg. 324 - 337