anova

advertisement
ANOVA & PRINCIPLES OF EXPERIMENTAL DESIGN
A regression analysis of observational data has some limitations. In particular, establishing a
cause-and-effect relationship between an independent variable x and the response y is difficult
since the values of the independent variables (relevant independent variables—both those in the
model and those omitted from the model) are not controlled, thereby allowing the possibility of
confounding factors. Recall that experimental data are data collected with the values of the x's
set in advance of observing y (i.e., the values of the x's are controlled). With experimental data,
we usually select the x's so that we can compare the mean responses, E(y), for several different
combinations of the x values.
The procedure for selecting sample data with the x's set in advance is called the design of the
experiment. The statistical procedure for comparing the population means is called an analysis
of variance. The objective of this handout is to introduce some key aspects of experimental
design. The analysis of the data from such experiments using an analysis of variance procedure
is the topic of the current chapter.
The study of experimental design originated in England and, in its early years, was associated
solely with agricultural experimentation. In agriculture, the need to save time and money led to a
study of ways to obtain more information using smaller samples. This was called Design of
Experiments. Similar motivations led to its subsequent acceptance and wide use in all fields of
scientific experimentation.
We will call the process of collecting sample data an experiment and the (dependent) variable to
be measured, the response y. The planning of the sampling procedure is called the design of
the experiment. The object upon which the response measurement y is taken is called an
experimental unit. The independent variables, quantitative or qualitative, that are related to the
response variable, y, are called factors. The value—that is, the setting ---assumed by a factor in
an experiment is called a level. The combinations of levels of the factors for which the response
will be observed are called treatments.
EXAM P L E 1.
A designed experiment. A marketing study is conducted to investigate the effect of brand and
shelf location on weekly coffee sales. Coffee sales are recorded for each of two brands (brand A
and brand B) at each of three shelf locations (bottom middle, and top) The 2 x 3 = 6
combinations of brand and shelf location were v med each week for a period of 18 weeks.
Below is a layout of the design. For this experiment identify
a. the experiment l unit
b. the response, y
c. the factors
d. the factor levels
e. the treatments
1
FIGURE 1.
Layout for designed experiment of Example 1:
Solution
a Since the data will be collected each week for a period of 18 weeks, the
experimental units are weeks.
b. The variable of interest, i.e., the response, is y = weekly coffee
sales. Note that weekly coffee sales are a quantitative variable.
c. Since we are interested in investigating the effect of brand and
shelf location on sales, brand and shelf location are the factors.
Note that both factors are qualitative variables, although, in
general, they may be quantitative or qualitative.
d. For this experiment, brand is measured at two levels (A and B)
and shelf location at three levels (bottom, middle and top3
e. Since coffee sales are recorded for each of the six brand-shelf location
combinations (brand.A, bottom), (brand A, middle), (brand A, top), (brand
B,bottom), (brand B, middle), and (brand B, top), then the experiment involves
six treatments (see Figure 1). The term treatments is used to describe the factor
level combinations to be included in an experiment because many experiments
involve "treating" or doing something to alter the nature of the experimental unit.
Thus, we might view the su brand-shelf location combinations as treatments on
the experimental units in the marketing study involving coffee sales
Now that we understand some of the terminology, it is helpful to think of the design of an
experiment in four steps.
STEP t Select the factors to be included in the experiment, and identify the parameters that are
the object of the study. Usually, the target parameters are the population means associated with
the factor level combinations (i.e., treatments)
2
STEP 2 Choose the treatments (the factor level combinations to be included in the experiment).
STEP 3 Determine the number of observations (sample size) to be made for each treatment.
[This will usually depend on the standard error(s) that you desire.]
STEP 4 Plan how the treatments will be assigned to the experiment units. That is, decide on
which design to use.
By following these steps, you can control the quantity of information in an experiment. We shall
explain how this is done in the next Section .
Generally in an experiment, we control which experimental units get which values of X
(treatments) and if the experimental units were similar before the experiment and different after,
we can infer a cause - effect relationship. To illustrate, 30 similar students are assigned to three
teaching methods (10 to each). If after the exponent (students being taught by the different
methods), students make higher exam grades in one teaching method, we can infer that the
teaching methods are affecting the grades.
Definition
A completely randomized design ( CRD ) to compare p treatments is one in which the treatments
are randomly assigned to the experimental units.
e.g. .30 students are randomly assigned to the three teaching methods.
Advantage: Easy design.
Problem with CRD:
If students are not similar, then we have too much randomness in the values of Y caused by the
lack of similarity.
Example: If the students vary too much in IQ, background, etc. then we might not be able to
detect differences in teaching methods.
Possible remedy. Have all thirty students take an IQ test at the start of the experiment; divide
the students into ten groups of three. The top three are the ones with the highest IQ score; the
next group of three has the next highest IQ scores, etc. Within each group of three, we randomly
assign them to the teaching methods. All teaching methods have similar students but all IQ
levels are covered in the experiment. This is called a Randomized Block Design (RBD)
3
Definition: A Randomized Block Design to compare p treatments involves b blocks, each block
containing p relatively homogeneous experimental units. The p treatments are randomly
assigned to the experimental units within each block, with one experimental unit assigned per
treatment.
Or
Definition: Randomized Block Design: N experimental units are divided into b blocks. Each
block has similar experimental units but each block is different from other blocks. Within each
block, one experimental unit is randomly assigned to a treatment (value of X).
Example: 30 students are divided into 10 groups. Each group has similar students (with respect
to IQ scores) but each group has students with different IQ. Within each group, a student is
randomly assigned to a teaching method.
Definition: A factorial Design is a completely randomized design with more than one X variable.
Example: We wish to study the effect of teaching method and whether the students get to use
computers or not.
Factor 1 = teaching method
Factor 2 = computer or not.
We now have sis combinations. The students are now randomly assigned to the sis
combinations. We also watch out for factor interaction.
The Importance of Randomization
All the basic designs presented in this chapter involve randomization of some sort. In a
completely randomized design and a basic factorial experiment, the treatments are randomly
assigned to the experimental units. In a randomized block design, the blocks are randomly
selected and the treatments within each block are assigned in random order. Why randomize?
The answer is related to the assumptions we make about the random error έ in the linear model.
Experimenters rarely know all of the important variables in a process, nor do they know the true
functional form of the model. Hence, the functional form chosen to fit the true relation is only an
approximation, and the variables included
in the experiment form only a subset of the total The random error έ, is thus a composite error
caused by the failure to include all of the important factors as well as the error in approximating
the function.
4
Although many unmeasured and important independent variables affecting the response y do
not vary in a completely random manner during the conduct of a designed experiment, we hope
their behavior is such that their cumulative effect varies in a random manner and satisfies the
assumptions upon which our inferential procedures are based. The randomization in a
designed experiment has the effect of randomly assigning these error effects to the treatments
and assists in satisfying the assumptions on έ.
5
Download