Uploaded by sham.shaji

Stats101B Chapter 1

advertisement
Course Introduction
Summer 2019
STATS 101B Introduction to Design and Analysis of Experiment
Maria Cha
Syllabus
• It will answer many of your administrative questions.
Please read thoroughly.
• It can be updated from time to time. But exam dates
and general outline of the course will not be changed.
Academic Integrity
• When you work on any graded assignments, it is
allowed to discuss with your classmates, but you
have to write your own assignments for submission. If
the graders find the identical assignments which
contain exactly same codes or comments, then the
case can be reported to the Dean of Students.
• Let’s just be the proud BRUINS!!
• No grade in any class is more important than your
physical and mental well-being and your integrity.
Take time to rest, eat, exercise, sleep, hang out with
friends, speak to a counselor, or whatever it takes to
take care of yourself.
Chapter 1
Summer 2019
STATS 101B Introduction to Design and Analysis of Experiment
Maria Cha
Types of the research studies
• We do studies to gather information and draw
conclusions. The type of conclusion we draw
depends on the study method used.
• Two types of conclusions from the research study:
• Correlation (association)
• Causation (cause and effect relationship)
• Two types of study methods:
• Observational study
• Experimental study
Observational Studies
• This study method measures the characteristics of a
population by studying individuals in a sample, but does not
attempt to manipulate or influence the variables of interest.
• Survey is a very good example of the observational study.
Visit Pew Research Center to find tons of report from the
observational study: http://www.pewresearch.org/
• Observational studies are valuable for discovering trends
and possible association.
• However, it is NOT possible for observational studies to
demonstrate a causal relationship.
Observational Studies
• It is NOT possible for observational studies to demonstrate
a causal relationship. – Why?
• Example: Suppose that we observe that a kid is violent (A)
and happens to watch a lot of violent TV shows (B):
• Possible scenarios for the cause and effect relationship
among the events :
• He could be violent because he is learning the behavior
(B causes A)
• He could be watching violent TV because he likes
violence (A causes B)
• He could be experiencing a mental health issue (A and
B are caused by C)
Observational Studies
• In observational studies, confounding variables may
exist. A confounding variable is an outside influence that
changes the effect of a dependent and independent
variable.
• Thus, we may find an association between the two
variables through the observational studies, but cannot
find a cause and effect relationship through the study.
Observational Studies
• Association vs. causation
• More pirates causes the global warming?
Observational Studies
• Association vs. causation
•
Mexican lemon imports prevent highway deaths?
Experimental Studies
• A designed experiment applies a treatment to individuals
(referred to as experimental units or subjects) and
attempts to isolate the effects of the treatment on a
response variable.
• There must be at least one treatment variable to
manipulate and one response variable to measure.
• The response variable is observed and compared for the
different groups of subjects who have been treated
differently.
• It is possible to show a causal relationship with an
experiment. But, not always.
Observational vs. Experimental
• Exercise: Use the two different study methods to
answer the same research question.
• Research question: Does exercise prevent colds?
• Briefly design and summarize your plans for the two
study methods:
• 1. Observational study
• 2. Experimental study
Observational vs. Experimental
• One possible design for the research using each method:
• 1. Observational study
• Randomly select a sample of subjects
• Record data for each subject on amount of exercise
and number of colds last year.
• Compare between people who exercise and people
who do not exercise.
• 2. Experimental study
• Obtain a group of study participants (often volunteers.)
• Manipulation: randomly assign the participants to the
treatment (exercise) and control groups (no exercise).
• After a set amount of time, record amount of exercise
and the number of colds for each person.
• Compare between people in the two groups.
Observational vs. Experimental
• In experiments, the treatments are assigned to the
different groups at random, while in observational
studies they are not.
• Random assignment to treatment and control groups
in an experiment helps equalize the groups with
respect to any confounding variables so any difference
in the response variable is attributable to the
explanatory variable.
Observational vs. Experimental
• Exercise: Read the article from Science Daily
(https://www.sciencedaily.com/releases/2008/07/08070
7081834.htm)
• What type of the study has been conducted in the
article?
• Do you agree with the title of the article: “PTSD
Causes Early Death From Heart Disease, Study
Suggests”?
Design of experiment
• Go back to the statement “It is possible to show a
causal relationship with an experiment. But, not
always.” – why not always?
• In the earlier example, we designed the experiment
with following orders:
• Obtain a group of study participants (often volunteers.)
• Manipulation: randomly assign the participants to the
treatment (exercise) and control groups (no exercise).
• After a set amount of time, record amount of exercise and
the number of colds for each person.
• Compare between people in the two groups.
• Can we conclude the cause and effect relationship
from this experiment?
Strategy of the experiment
• To understand cause-and-effect relationships in a system or
process, you need to conduct experiments.
• Thus, you must deliberately change the input variables to
the system and observe changes in the system output that
these changes to the input produce.
• Each time you run an experiment, it is called a test.
• An experiment can be defined a series of runs or tests in
which purposeful changes are made to the input variables
of a process or system so that we may observe and identify
the reasons for changes that may be observed in the output
response.
Some approaches
• 1. Best guess approach
• Select an arbitrary combination of factors, test them and
see what the outcome is.
• Then switch the levels of one or two (or more) factors
based on the previous combination and repeat.
• Seems to work well because the experimenters typically
have a lot of knowledge and practical experience of the
system.
• Pitfalls:
• If the guesses do not work well, it will take a long
time with no guarantee of success.
• If the first guess produces an acceptable results the
experimenter might stop testing.
Some approaches
• 2. One factor a time (OFAT) approach
• This method consist of selecting a starting point (or
baseline set of levels) for each factor then successively
varying each factor over its range with the other factors
held constant at the baseline level.
• In other words, the experimenter looks at how the
response variable is affected by varying each factor with
all other factors held constant.
• A major pitfall: it fails to consider any possible interaction
between the factors.
Some approaches
• 3. Factorial design approach
• Factors are varied together, instead of one at a time.
• This is the correct approach to dealing with several
factors.
• Very important concept and we will discuss it extensively
throughout this course.
• This design is the most efficient use of experimental
data.
• If there are 4 or more factors, we can use a design for a
subset of the runs. This is called a fractional factorial
experiment.
Principles of experimental design
• “Statistical” design of experiment : The process of planning
the experiment so the appropriate data will be collected and
analyzed by statistical methods, resulting in valid and objective
conclusions.
• Two aspects of any experimental problem :
• 1. The design of the experiment
• 2. Statistical analysis of the data : the method of analysis
depends directly on the design employed.
• 3 basic principles of experimental design :
• 1. Randomization
• 2. Replication
• 3. Blocking
Randomization
• Randomization usually makes this assumption valid.
• the allocation of units to treatments, is randomly
determined, which prevents subjective assignment.
• The order in which the individual runs of the
experiment are to be performed is randomly
determined.
• Can help “average out” the effects of irrelevant or
unknown factors that may be present.
• Computer software assists with randomization. For
example, a random number generator can be used to
randomize the order of runs.
Replication
• Replication means an independent repeated run of each
factor combination. We usually call it as “number of
observations in a sample”.
• Two properties :
• 1. This allows the experimenter to obtain an estimate of
experimental error, which is a basic unit of measurement
to determine whether observed differences in the data
are really statistically significant.
• 2. If the sample mean (๐‘ฆ)
เดค is used to estimate the true
mean response for one of the factor levels, then this
allows for a more precise estimate of the parameter.
• Helps to attain a more reliable estimate of the effect of each
treatment.
• One replication might not give us enough information to
form a conclusion.
Replication
• Difference from repetition (repeated measure) :
• replication: the treatment is applied to different
(multiple) observations or units.
• repeated measures or repetitions: the treatment is
applied to the same observations or units in multiple
times.
• Replication reflects sources of variability both
between runs and potentially within runs.
Blocking
• A design technique which deals with nuisance factors.
• Nuisance factors are factors that may influence the
experimental response but we are not interested in them.
• A block is a group of homogenous (or like) units.
• For example, we want to know if caffeine really does cause
higher memory retention. We suspect people of similar ages
might see similar effects from caffeine. Hence we can block
these people by putting those with similar ages in the same
group, i.e. young adults, middle-aged, senior citizens.
Blocking
• Blocking a nuisance factor can lead to an increase in power,
i.e. our ability to detect real effects.
• Randomization is performed within each block.
• For blocking to be effective, the units should be arranged so
that the within-block variation is much smaller than the
between-block variation.
• In general, block what you can and randomize what you
cannot.
Guideline for Designing Experiments
1. Recognition of and statement of the problem
• Sometimes, in practice, this is not simple.
2. Selection of the response variable
• Make sure that the variable really provides useful
information
• Responses may be discrete or continuous. Continuous
responses are generally preferable.
• The experimenter must decide how the response is
measured.
Guideline for Designing Experiments
3. Choice of factors, levels, and ranges
• A factor is a variable that is studied in the experiment.
• Different levels or settings are determined for each
factor.
• A treatment is a combination of factor levels.
• Design factors vs nuisance factors (factors of interest vs
factors of no interest)
• Factors may be quantitative or qualitative.
4. Choice of experimental design
• consideration of sample size (number of replicates)
• selection of a suitable run order for the experimental
trials
• determination of whether or not blocking or other
randomization restrictions are involved
Guideline for Designing Experiments
5. Performing the experiment
• monitor the process carefully to ensure that everything is
being done according to plan
• errors in experimental procedure at this stage will
usually destroy experimental validity
6. Statistical analysis of the data
• graphs, models, hypotheses tests, diagnostics
7. Conclusions and recommendation
Download