Observational studies and Experimentation

advertisement
Observational studies and Experimentation*
The vast majority of sample surveys implemented nowadays may be classified under the broad
heading of observational studies. Measurements are made on a range of variables across a range
of individuals, (the sample). Given the values of the variables that happen to have been
observed, relationships between variables will frequently emerge. In fact, in a sensibly planned
survey, such relationships will be expected and the purpose of the survey is to quantify them. For
example, in researching attitudes to and interest in using a range of new mobile services, interest
may focus on whether individuals are likely to purchase one or other proposed new mobile
services and, if so, how much individuals are likely to be prepared to pay for such services.
Answers to such questions may influence decision making about introducing new products.
Given a positive decision, when it comes to deciding on marketing strategies for such a new
product, then the relationships of such variables with other variables, for example, income, age,
leisure activities, that might be thought to influence them, will become important. However,
there is an assumption being made here that the so-called response and explanatory variables are
in a cause-effect relationship. This means that changes in the values of the explanatory variables,
income, age, leisure activities etc., actually cause changes in the response variables, rather than
being merely statistically / numerically related to the observed changes in the responses.
Such assumptions, while essential if progress is to be made in business decision-making, are
fallible in the absence of properly designed and controlled experiments. Recall the discussion of
"Simpson's paradox" in section 7.4, pages 20-21, where a simplistic analysis suggested that there
was a relationship between loan size and default rate but where the relationship disappeared
when the influence of a third variable, loan type, was taken into account. While the paradox was
resolved in that situation, because the so-called "lurking variable" had in fact been observed,
there is no guarantee in an observational study that a "lurking variable" has not been overlooked
and, therefore, no guarantee that an observed statistical relationship corresponds to cause and
effect. It cannot be stressed enough that statistical relationships, of themselves, do not imply
cause and effect. It must also be said that the temptation to infer cause and effect relationships
using business judgement is fraught with danger. As one commentator put it:
"The justification sometimes advanced that a multiple regression analysis on
observational data can be relied upon if there is an adequate theoretical
background is utterly specious and disregards the unlimited capability of the
human intellect for producing plausible explanations by the carload lot."
(K.A. Brownlee, Statistical Theory and Methodology in Science and Engineering,
Wiley, 1965, page 454).
To ensure as far as possible that an observed relationship does correspond to cause and effect
requires a properly designed experiment in a properly controlled environment. The simplest
form of experiment involves the studying the effect of a single change. In the experiment
described in Section 1.9, the change was from one version of a process to another. A classic
experiment in retail marketing is comparing sales of goods in large retail stores, using either midaisle or end-aisle displays. Drugs manufacturers conduct extensive trials to evaluate new drug
treatments. The simplest of these involves comparison of a new treatment with a "placebo", that
is, a version of the treatment that omits the active drug ingredient.
In all these examples, two key ingredients are the factor (or design variable) changes in which
may bring about a desired effect and the experimental unit to which the factor is applied. In the
examples described, the factors (and their levels) were
* from An Introduction to Statistical Analysis for Business and Industry by Michael Stuart.
Section 11.4, pp. 356-359. Copyright © 2003.
Page 2
process, (old or new),
display location, (mid-aisle or end-aisle),
drug, (present or absent),
respectively.
The experimental unit in the case of the process change experiment described in Section 1.9 was
a working day, chosen primarily for convenience. It could be argued that shorter time periods
could have been chosen as experimental units, with a view to minimising variation within units,
thus allowing possible variation between units associated with the process change to be more
evident. However, economic and logistical considerations conspire against this.
In the retail marketing example, the experimental unit could be a time period such as a week,
with the factor levels, mid-aisle and end-aisle display, alternating in successive pairs of weeks in
a design similar to that used in the process change experiment. Ideally, the alternating pattern is
chosen at random, with a view to reducing the effect of any other systematic pattern of variation
that might be present without our knowledge. To achieve maximum homogeneity of
experimental units, the experiment should be run entirely within a single store, necessarily over
several weeks.
Alternatively, in a chain of retail stores, a store could constitute an experimental unit with the
entire experiment being run in one week using several stores. However, there are likely to be
considerable differences between stores, other than a possible factor effect difference, thus
making it difficult to detect a factor effect if present. It may be possible to reduce such
differences by pairing stores with known similarities, then randomly allocating factor levels to
stores within pairs to minimise the possibility of unknown systematic differences between stores
affecting the result of the experiment1.
In the drug testing example, the obvious experimental unit is the individual patient. The reason
for administering an inactive form of the drug to those not receiving the active form is to allow
for the well established placebo effect whereby, in many cases, patients show some improvement
when they think they have received a treatment, even when the treatment is inactive. Thus the
placebo is administered so that all patients think that they are receiving the treatment.
However, there is another actor involved, the doctor who prescribes the treatment for the patient,
and there is a major ethical problem in that the doctor is required to prescribe a placebo for some
patients that might well benefit from the real treatment. Another problem is that, in evaluating
the result, the doctor making the diagnosis may be influenced by knowledge of which patient got
which treatment. To overcome these difficulties, treatments are allocated randomly to patients
and identified with the patients' names in such a way that the doctor involved does not know
which patient gets which treatment. Such experiments, in which neither patient nor doctor
knows who got which treatment, are referred to as double blind experiments. They have the
effect of sharing the ethical problem with the whole research team and avoiding possible doctor
bias, either in prescription or subsequent evaluation.
1
There is a third possibility in which the effects of differences over time and differences between stores both
can be minimised. This involves the use of what is called a latin square design. This is not pursued here. Details
may be found in the Supplements and Extensions page of the book's website.
Page 3
Randomised blocks
These three apparently simple examples illustrate the care that is needed in designing satisfactory
experiments so as to achieve the level of control necessary to be able to infer a cause and effect
relationship; there are many more complex issues that arise in different circumstances. Two
basic principles that emerge are those of blocking and randomisation. When there are known
differences between experimental units, it makes sense to group the units into blocks2 that are as
similar to each other (homogeneous) as possible. In the examples discussed above, blocks
consist of pairs. If, in the process change experiment, there were six versions of the process to be
compared, it would be sensible to form blocks of the six days in a week and apply each version
of the process on one of the days. In order to minimise the effect of any other systematic pattern
of variation that might exist unknown to the experimenter, versions of the process are allocated
to days of the week at random.
Replication
This whole scheme may then be replicated, that is, repeated over a number of weeks. In theory,
the number of replications is chosen so as to give desirable power, in the sense discussed in
Section 5.3; the more replications, the better the chances of detecting real effects of the
experimental changes. Replication also makes it more likely that random allocation of factor
levels to experimental units will deliver the protection it promises. In practice, more often than
not, the number of replications is determined by available resources.
The randomised block experimental design is a basic form of design that may be elaborated on in
many ways. The design allows valid comparisons between levels of the experimental factor
within each block, which may be combined across blocks. In this way, systematic differences
between blocks do not interfere in the assessment of factor effects, while randomisation gives
some protection against unknown sources of systematic variation.
2
The term originated in agricultural experimentation where neighbouring experimental plots of land were
grouped into relatively homogeneous blocks.
Download