Course Introduction Summer 2019 STATS 101B Introduction to Design and Analysis of Experiment Maria Cha Syllabus • It will answer many of your administrative questions. Please read thoroughly. • It can be updated from time to time. But exam dates and general outline of the course will not be changed. Academic Integrity • When you work on any graded assignments, it is allowed to discuss with your classmates, but you have to write your own assignments for submission. If the graders find the identical assignments which contain exactly same codes or comments, then the case can be reported to the Dean of Students. • Let’s just be the proud BRUINS!! • No grade in any class is more important than your physical and mental well-being and your integrity. Take time to rest, eat, exercise, sleep, hang out with friends, speak to a counselor, or whatever it takes to take care of yourself. Chapter 1 Summer 2019 STATS 101B Introduction to Design and Analysis of Experiment Maria Cha Types of the research studies • We do studies to gather information and draw conclusions. The type of conclusion we draw depends on the study method used. • Two types of conclusions from the research study: • Correlation (association) • Causation (cause and effect relationship) • Two types of study methods: • Observational study • Experimental study Observational Studies • This study method measures the characteristics of a population by studying individuals in a sample, but does not attempt to manipulate or influence the variables of interest. • Survey is a very good example of the observational study. Visit Pew Research Center to find tons of report from the observational study: http://www.pewresearch.org/ • Observational studies are valuable for discovering trends and possible association. • However, it is NOT possible for observational studies to demonstrate a causal relationship. Observational Studies • It is NOT possible for observational studies to demonstrate a causal relationship. – Why? • Example: Suppose that we observe that a kid is violent (A) and happens to watch a lot of violent TV shows (B): • Possible scenarios for the cause and effect relationship among the events : • He could be violent because he is learning the behavior (B causes A) • He could be watching violent TV because he likes violence (A causes B) • He could be experiencing a mental health issue (A and B are caused by C) Observational Studies • In observational studies, confounding variables may exist. A confounding variable is an outside influence that changes the effect of a dependent and independent variable. • Thus, we may find an association between the two variables through the observational studies, but cannot find a cause and effect relationship through the study. Observational Studies • Association vs. causation • More pirates causes the global warming? Observational Studies • Association vs. causation • Mexican lemon imports prevent highway deaths? Experimental Studies • A designed experiment applies a treatment to individuals (referred to as experimental units or subjects) and attempts to isolate the effects of the treatment on a response variable. • There must be at least one treatment variable to manipulate and one response variable to measure. • The response variable is observed and compared for the different groups of subjects who have been treated differently. • It is possible to show a causal relationship with an experiment. But, not always. Observational vs. Experimental • Exercise: Use the two different study methods to answer the same research question. • Research question: Does exercise prevent colds? • Briefly design and summarize your plans for the two study methods: • 1. Observational study • 2. Experimental study Observational vs. Experimental • One possible design for the research using each method: • 1. Observational study • Randomly select a sample of subjects • Record data for each subject on amount of exercise and number of colds last year. • Compare between people who exercise and people who do not exercise. • 2. Experimental study • Obtain a group of study participants (often volunteers.) • Manipulation: randomly assign the participants to the treatment (exercise) and control groups (no exercise). • After a set amount of time, record amount of exercise and the number of colds for each person. • Compare between people in the two groups. Observational vs. Experimental • In experiments, the treatments are assigned to the different groups at random, while in observational studies they are not. • Random assignment to treatment and control groups in an experiment helps equalize the groups with respect to any confounding variables so any difference in the response variable is attributable to the explanatory variable. Observational vs. Experimental • Exercise: Read the article from Science Daily (https://www.sciencedaily.com/releases/2008/07/08070 7081834.htm) • What type of the study has been conducted in the article? • Do you agree with the title of the article: “PTSD Causes Early Death From Heart Disease, Study Suggests”? Design of experiment • Go back to the statement “It is possible to show a causal relationship with an experiment. But, not always.” – why not always? • In the earlier example, we designed the experiment with following orders: • Obtain a group of study participants (often volunteers.) • Manipulation: randomly assign the participants to the treatment (exercise) and control groups (no exercise). • After a set amount of time, record amount of exercise and the number of colds for each person. • Compare between people in the two groups. • Can we conclude the cause and effect relationship from this experiment? Strategy of the experiment • To understand cause-and-effect relationships in a system or process, you need to conduct experiments. • Thus, you must deliberately change the input variables to the system and observe changes in the system output that these changes to the input produce. • Each time you run an experiment, it is called a test. • An experiment can be defined a series of runs or tests in which purposeful changes are made to the input variables of a process or system so that we may observe and identify the reasons for changes that may be observed in the output response. Some approaches • 1. Best guess approach • Select an arbitrary combination of factors, test them and see what the outcome is. • Then switch the levels of one or two (or more) factors based on the previous combination and repeat. • Seems to work well because the experimenters typically have a lot of knowledge and practical experience of the system. • Pitfalls: • If the guesses do not work well, it will take a long time with no guarantee of success. • If the first guess produces an acceptable results the experimenter might stop testing. Some approaches • 2. One factor a time (OFAT) approach • This method consist of selecting a starting point (or baseline set of levels) for each factor then successively varying each factor over its range with the other factors held constant at the baseline level. • In other words, the experimenter looks at how the response variable is affected by varying each factor with all other factors held constant. • A major pitfall: it fails to consider any possible interaction between the factors. Some approaches • 3. Factorial design approach • Factors are varied together, instead of one at a time. • This is the correct approach to dealing with several factors. • Very important concept and we will discuss it extensively throughout this course. • This design is the most efficient use of experimental data. • If there are 4 or more factors, we can use a design for a subset of the runs. This is called a fractional factorial experiment. Principles of experimental design • “Statistical” design of experiment : The process of planning the experiment so the appropriate data will be collected and analyzed by statistical methods, resulting in valid and objective conclusions. • Two aspects of any experimental problem : • 1. The design of the experiment • 2. Statistical analysis of the data : the method of analysis depends directly on the design employed. • 3 basic principles of experimental design : • 1. Randomization • 2. Replication • 3. Blocking Randomization • Randomization usually makes this assumption valid. • the allocation of units to treatments, is randomly determined, which prevents subjective assignment. • The order in which the individual runs of the experiment are to be performed is randomly determined. • Can help “average out” the effects of irrelevant or unknown factors that may be present. • Computer software assists with randomization. For example, a random number generator can be used to randomize the order of runs. Replication • Replication means an independent repeated run of each factor combination. We usually call it as “number of observations in a sample”. • Two properties : • 1. This allows the experimenter to obtain an estimate of experimental error, which is a basic unit of measurement to determine whether observed differences in the data are really statistically significant. • 2. If the sample mean (๐ฆ) เดค is used to estimate the true mean response for one of the factor levels, then this allows for a more precise estimate of the parameter. • Helps to attain a more reliable estimate of the effect of each treatment. • One replication might not give us enough information to form a conclusion. Replication • Difference from repetition (repeated measure) : • replication: the treatment is applied to different (multiple) observations or units. • repeated measures or repetitions: the treatment is applied to the same observations or units in multiple times. • Replication reflects sources of variability both between runs and potentially within runs. Blocking • A design technique which deals with nuisance factors. • Nuisance factors are factors that may influence the experimental response but we are not interested in them. • A block is a group of homogenous (or like) units. • For example, we want to know if caffeine really does cause higher memory retention. We suspect people of similar ages might see similar effects from caffeine. Hence we can block these people by putting those with similar ages in the same group, i.e. young adults, middle-aged, senior citizens. Blocking • Blocking a nuisance factor can lead to an increase in power, i.e. our ability to detect real effects. • Randomization is performed within each block. • For blocking to be effective, the units should be arranged so that the within-block variation is much smaller than the between-block variation. • In general, block what you can and randomize what you cannot. Guideline for Designing Experiments 1. Recognition of and statement of the problem • Sometimes, in practice, this is not simple. 2. Selection of the response variable • Make sure that the variable really provides useful information • Responses may be discrete or continuous. Continuous responses are generally preferable. • The experimenter must decide how the response is measured. Guideline for Designing Experiments 3. Choice of factors, levels, and ranges • A factor is a variable that is studied in the experiment. • Different levels or settings are determined for each factor. • A treatment is a combination of factor levels. • Design factors vs nuisance factors (factors of interest vs factors of no interest) • Factors may be quantitative or qualitative. 4. Choice of experimental design • consideration of sample size (number of replicates) • selection of a suitable run order for the experimental trials • determination of whether or not blocking or other randomization restrictions are involved Guideline for Designing Experiments 5. Performing the experiment • monitor the process carefully to ensure that everything is being done according to plan • errors in experimental procedure at this stage will usually destroy experimental validity 6. Statistical analysis of the data • graphs, models, hypotheses tests, diagnostics 7. Conclusions and recommendation