Chapter 3 Obtaining Useful Evidence Research design is that portion of the statistical process in which planning is done so that the conclusions are drawn with confidence and can be supported under scrutiny. The research design process begins with the research question(s). Consider the following three questions. 1. Do people believe government money should be spent to build sports arenas for professional team? 2. Is there a difference in the price home day cares charge and childcare centers charge? 3. Does removing invasive plants in spring lead to a reduction in the percent of invasive plants in the summer? The first question illustrates research that would be appropriate for finding out something about a population or testing a hypothesis about a population. A survey could be used to get public opinion. The second question compares groups based on a characteristic of the group, in this case type of daycare. The objective is to show if there is a difference between the groups with regard to cost. The third question seeks to show if an intervention will have an effect on some outcome. The goal is to establish a cause and effect relationship since such a relationship could lead to better decision-making. Showing causation (a cause and effect relationship) is a highly desirable outcome for many research questions. In a causal relationship, a treatment produces a particular outcome while not providing the treatment means that particular outcome is not produced. Thus, simply showing that a certain response occurred when a treatment was provided does not prove the treatment caused the response. A cause and effect relationship implies the existence of two variables. One of those variables must happen before the other. independent or explanatory variable dependent or response variable The values that the explanatory variable can have are called levels. Consider education research. How can we maximize results? To answer this question, we would need to know how to measure results – the response variable. We will also need to have a theory of what variable needs to be changed to impact the response variable – e.g. what will be the explanatory variable. To decide on an explanatory variable, generate a list of possible variables that might affect the response variable. These will be characterized by students, teachers, school. Students Teachers School Pick some of these to demonstrate the concept of levels Discuss confounding or latent variables Consider two educational theories Theory 1: Students who are more successful in a prerequisite class will be more successful in the current class. (e.g. success in Math 98 = success in Math 146) Theory 2: Students will be more successful with teachers who give regular unannounced quizzes (e.g. weekly or daily) than with teachers who don’t give quizzes. What are possible response variables for these two theories? What are the explanatory variables for these theories? Proving causation is challenging. Some research designs can show causation if properly done while others cannot show causation. Research Designs Observational study The goal is to understand the population or differences between the populations. Units are randomly selected. Cannot show causation Observational experiment The goal is to determine the effect of an interaction No random assignment is possible. Might be able to show causation Manipulative experiment The goal is to determine the effect of an interaction Random assignment of units to groups facilitates showing causation One of the goals of all research is to avoid Bias – The systematic prejudice in one direction. Approval for research: Institutional Review Board (IRB). . Observational Studies The goal of observational studies is to understand certain characteristics about a population or to test a hypothesis about those characteristics. Census Random Selection One population or comparison between multiple populations (grouped based on a characteristic they possess that cannot be changed either because change is impossible or unethical. For example, it is not possible to change a person’s race or age group or income level. It would be unethical to change their use of drugs or smoking.) Surveys The construction of surveys is a more difficult process than it may first appear if the goal is to get unbiased data. 1. Leading questions: A. Do you believe the minimum wage should be raised so that a person working full time is not still living below the poverty line? B. Do you believe the minimum wage should be raised, kept the same, or lowered? 2. Sequence of questions: A. With which political party do you most closely associate? (Republican, Democrat, Tea Party, Libertarian, Socialist, Green Party, Independent) Do you believe that humans are causing the climate to change? B. Do you believe that humans are causing the climate to change? With which political party do you most closely associate? (Republican, Democrat, Tea Party, Libertarian, Socialist, Green Party, Independent) Experiments While the goal of an observational study is to understand characteristics of a population, the goal of an experiment is to assess the effect of an impact or treatment. To show causation, an experiment requires a comparison between those receiving the treatment and those not receiving the treatment. The latter group is called a control group. The term control is used in two different ways in experiment designs. One use of control is in the term “control group” which is a group that does not receive the treatment. The second use of the word control is that an experiment can be controlled for a latent variable by making sure that people or units with that variable are randomly assigned to different explanatory variable levels. The concept of random assignment is critical for experiments. Because of potential latent variables that could affect the response variable, experimental units are randomly assigned to different groups. Manipulative experiments are the most effective design method for establishing causation. In a manipulative experiment, the researcher controls the level of the explanatory variable. This is done through the random assignment of experimental units to the different levels of the explanatory variable. A related experiment, called a quasi experiment, is done when groups of subjects are assigned a level of treatment. A well-designed experiment should have both internal and external validity. Internal validity means that the design is strong enough to show a cause-and-effect relationship. External validity means that similar results are obtained when the treatment is applied to the entire population as was achieved in the experiment itself.1 Human Experiments Medical experiment on humans can be more complicated than other types of experiments because of the mind/body interactions. Factors that affect improvement Subject Severity of problem Natural history of the problem (where in the cycle of symptom variation is the subject at the time of treatment) Ability to accurately quantify the state of the condition (e.g. pain 1-10) Believe the treatment will help (psychosomatic effect) Desire to please the doctor Placebo effect Researcher Implication of the success or failure of the treatment (financial, reputation) Funding source for research 1 http://www.socialresearchmethods.net/kb/external.php Appropriate assessment of patient’s condition Effectiveness of laboratory research quality Knowledge of which subjects receive the treatment Medical treatments are often Manipulative Experiments. The response variable is some measure of improvement (e.g. reduced pain or symptoms). The explanatory variable is whether a person receives the medical treatment. To account for potential placebo or psychosomatic effects, the researchers randomly assign some subject to the placebo level and others to the treatment level. Placebo and psychosomatic effects Double blind experiments Bias from breaking blind (80% know which group they are in) Active Placebo Quality assurance/quality control (QA/QC) is important for the research protocol and for the laboratory where tests will be done. Observational experiments (also known as ex post facto studies) The objective is similar to that of a manipulative experiment in that the purpose is to determine the effect of an intervention. However, in these experiments, the researcher cannot randomly assign subject to different groups but can only measure the level of the explanatory variable and record the response variable, thus gathering the data as if it were an observational study. 1. Does creating a marine reserve improve the abundance and varieties of fish? 2. Does flooding farmland fields in winter to create habitat for migratory birds improve the farmland quality? (http://mynorthwest.com/11/321774/Farmers-find-flooded-fieldscan-help-birds-crops). 3. Does installing a camera at an intersection reduce the number of people who run a red light? Before-after-control-impact or BACI experiment design. One way of establishing causation with experiments of this magnitude is with a before-after-control-impact design (BACI). For example, in Seattle, they are attempting to drill a tunnel under the city to replace the Alaskan Way Viaduct. To determine the impact on major roads through the city, a BACI study could be constructed in which the traffic on a major alternative to the tunnel could be monitored before and after the opening of the tunnel. During the same time period, traffic on main north-south roads above and below the tunnel could be monitored as a control group. Significance and Causation Showing an effect and proving causation is not the same thing. Observational experiments are not effective for proving causation. While it is necessary to show an effect in order to establish causation, it is not sufficient. Causation means that changes in the explanatory factor cause changes in the response variable. Epidemiologists have three criteria they use to establish causation with observational experiments. 1. The effect exists in many different times and places, reducing the chance of confounding. 2. A logical or scientific explanation exists for why the explanatory variable causes changes in the response variable. 3. A different explanation does not exist.2 2 Aliaga, Martha, and Brenda Gunderson. Interactive Statistics. Upper Saddle River, NJ: Pearson Prentice Hall, 2006. Print. For each new project, the initial research-design efforts can be confusing. Clarity begins with asking the right specific questions and is further enhanced by determining the appropriate response variable, type of research design, etc. In this text we will use a research design table to help organize our thoughts. When options are presented, circle the appropriate choice. Think of the potential latent variables yourself; they are not usually included in the story. Research Design Table Research Question: Observational Study Observational Experiment Manipulative Experiment Type of Research What is the response variable? What is the parameter that will be calculated? Mean List potential latent variables. Grouping/explanatory Variables 1 (if present) Levels: Grouping/explanatory Variables 2 (if present) Levels: Proportion Correlation Some of the examples below contain underlined words, others do not. The purpose of underlining is to help you identify the key words in the story. Ultimately, you need to identify these parts without them being underlined. Example 1. What is the average number of months a cell phone is used before being discarded? A cell phone company wants to know the normal life expectance of a cell phone for their customers. They review their records to see the amount of time a number was associated with the same phone and person. They determine the average number of months a person uses a cell phone before discarding it. . Example 2. Will requiring college success class lead to an improvement in the 3-year graduation rate of Pierce students? College success classes were required for all new students at Pierce in Fall 2014. The data that will be recorded is whether a student has graduated by the end of their third year. A comparison of the proportion of people who graduate will be made for two time periods, before the implementation of the policy and after its implementation. Example 3. Ground beef is labeled with the percent fat, such as 4% fat (lean) or 20% fat (not lean). The leaner the meat, the more expensive it is per pound so it is important to know if packages marked as leaner, really are leaner. To find out, packages of ground beef will be purchased and fried, with all the fat being collected and weighed. The percent of fat from each package will be calculated (weight of fat divided by weight of ground beef before cooking). Is the average percentage of fat less in the packages labeled as leaner ground beef? The average of fat percentage from the lean packages will be compared to the average fat percentage from the not-lean packages. Example 4. Does divorce of a child in 9th or 10th grade have a negative impact on the child’s academic success? To determine the impact of a divorce on a child in 9th or 10th grade, the mean GPA of students before a divorce is compared to the mean GPA after the divorce. A control group consisting of children whose parents were not divorced was compared to the impact group of children whose parents were divorced. Example 5. Is there a relationship between the BMI and % body fat of a person? BMI (Body mass Index) is used to determine obesity in people. However, it is based only on height and weight and not on the percent of body fat a person has. Consequently, it is natural to wonder if there is a relationship between BMI and percent body fat. The assumption is that percent body fat influences BMI. Extra Examples Example 6. Is there a relationship between the amount of weight a person can squat lift and the height the person can jump? Randomly selected adults are tested to see the amount of weight they can squat lift and their vertical jump height. The expectation is that greater leg strength would correspond to higher jumps. Example 7. What proportion of students who plan to take time off of school actually return to school? A survey of students who left Pierce College over 5 years ago before completing their degree was done. The former students were asked if they had returned to school (either at Pierce or somewhere else). The proportion of students who returned was calculated. Complete In-class Activity– Design Tables –Page 275 Sampling Observational studies and some observational experiments require random sampling from a population. The next step in the research design process is to determine how a sample will be taken from the population so that it is representative of the population. The objective is to avoid bias. Bias can result from who is asked, who can’t be asked and how they are asked. Another source of bias comes from using data that is not independent. Data that are independent means the knowledge of one datum does not give any indication of the value of another. Time series data is a special concern in this regard. Using time series data for a BACI design or for a correlation study must be done by sampling data from years that are far enough apart that the data do not have serial dependence. Probability Sampling Methods Simple random – sample random selection from all units. Complete In-class Activities – SRS on Calculator –Page 277 Stratified: Define strata then simple random from each stratum separately. Systematic: randomly select 1 value from 1 to k, then add k repeatedly. Cluster: Define the clusters then do a simple random sample of the clusters and use all the units from the cluster. Demonstrate on the book example. Cluster 1 1 Yes 2 Yes 3 No 4 Yes 5 No 6 Yes 7 No 8 Yes 9 No 10 No 11 No 12 Yes 13 Yes 14 Yes 15 No 16 No 17 Yes 18 Yes 19 No 20 No West Side of Harbor Cluster 2 Cluster 3 21 No 41 Yes 22 No 42 No 23 No 43 No 24 No 44 Yes 25 No 45 No 26 Yes 46 Yes 27 No 47 No 28 No 48 No 29 No 49 No 30 No 50 No 31 No 51 Yes 32 No 52 Yes 33 No 53 Yes 34 No 54 Yes 35 Yes 55 No 36 Yes 56 No 37 Yes 57 No 38 Yes 58 No 39 No 59 Yes 40 No 60 No Cluster 4 61 No 62 No 63 No 64 No 65 No 66 No 67 Yes 68 No 69 No 70 Yes 71 No 72 Yes 73 Yes 74 Yes 75 Yes 76 Yes 77 No 78 Yes 79 No 80 No Cluster 5 81 No 82 Yes 83 No 84 No 85 Yes 86 Yes 87 No 88 No 89 Yes 90 No 91 No 92 Yes 93 No 94 Yes 95 No 96 No 97 Yes 98 No 99 No 100 No East Side of Harbor Cluster 6 101 No 102 No 103 Yes 104 Yes 105 Yes 106 No 107 No 108 Yes 109 No 110 No 111 No 112 Yes 113 No 114 No 115 No 116 No 117 Yes 118 Yes 119 Yes 120 Yes Cluster 7 121 No 122 Yes 123 No 124 No 125 Yes 126 No 127 Yes 128 No 129 Yes 130 No 131 No 132 Yes 133 No 134 Yes 135 No 136 No 137 Yes 138 No 139 Yes 140 Yes Complete and submit In-class Activities – Compare and Contrast Sampling Methods - Page 279