Hertog, Introduction to the experimental method

advertisement
Experiments
One of the most basic yet most powerful research methods is the experiment. Most researchers
would agree that an experiment is the strongest test of a causal relationship. A number of
physical sciences (for example, physics and chemistry) are heavily dependent upon the
experimental method to develop and test theory. Use of experiments in social science is far more
controversial, for reasons discussed below.
Crucial features of the experiment are the manipulation of an independent variable by the
researcher, the subsequent careful measurement of a dependent variable, and control over the
conditions under which the manipulation and measurement occur.
In essence, the experiment is a sophisticated form of "trying things out." That is, the
experimenter is saying, essentially, "Let's try this and see what happens." The goal is to
determine whether manipulating one thing (the 'independent variable') affects something else
(the 'dependent variable').
Here's an example. Let's say I took two young boys and had one play "Where in the World is
Carmen Sandiego" every day for two hours for a year. The other had (nearly) identical school
experiences as the first except that he played "Vice City." After a year, I could have someone
observe them on the school playground over a two-week period and mark down every act of
aggression they engaged in. If, as predicted, our "Vice City" subject displayed more aggression,
we might conclude that exposure to that video game leads to aggression among kids.
In the example above, I manipulated exposure to violent video games (two hours per day of
Carmen Sandiego v. Vice City), measured 'aggression' (observed and scored aggressive
playground behavior) and controlled experience in school (though I could not control what
happened at home). As you will see below, this example represents a 'field experiment' where the
level of control is relatively low but the environment for the study is less artificial than that found
in laboratory experiments.
Now, before I hear a chorus of "wait a minute, there's a lot of other things that might have
caused the difference" I will acknowledge that the experiment I just outlined has a ton of holes
in it. That is, there are a number of weaknesses to the particular research method I just outlined.
Clearly, a sample of two kids is inadequate. Even if I increased the sample size greatly,
however, there would still be a chance that the difference in aggression was due to something
other than the game-playing. Much of the art of constructing a good experiment is aimed at
eliminating "third variable" explanations for the differences between subjects that get the
different "treatments."
A common explanation for finding differences among subjects after the experimental treatment
is that they were different before the treatment. In our example above, that would mean that
perhaps the kid that played "Vice City" was naturally more aggressive and exposure to the
game had nothing to do with his behavior in the schoolyard. Frankly, if you are only going to
use two kids in your experiment, you can't really eliminate that explanation. To reduce the
likelihood that differences between groups on the dependent variable were the result of
differences that existed before the manipulation, researchers increase the number of subjects. If
a researcher wanted to carry out my experiment, she would randomly assign several kids to
play each of the games and then compare the aggression exhibited by those who played one
game and those who played the other. In most cases, that would mean comparing the average
aggression score for the Carmen Sandiego players with the average score for the Vice City
players. Although this doesn't eliminate the possibility that the Vice City kids were naturally
more aggressive, it makes it far less likely. We can use statistics to determine how likely it is
that differences among the subjects prior to the experimental treatment can explain the
differences found after the treatment.
Usually we say that we are determining how likely it is that the results occurred by chance.
Common levels of chance that are accepted in the field of communication studies are 1 in 20
(95%) or 1 in 100 (99%). What that means is that if we found a difference between the two
groups that would occur by chance only one time in 100, (for example), we would conclude
that the difference we found was caused by the treatment and not due to differences existing
prior to the experiment.
To avoid having something other than chance decide which kinds of people end up in each of
your groups, we 'randomly assign subjects to treatments.' That means we use some sort of
random process to determine which group each subject is assigned to. A common method is
going down the list of names and flipping a coin for each one, with heads putting the subject in
the experimental group and tails putting him in the control group. If you let subjects choose for
themselves, friends might decide to be in the same group, etc. When subjects are 'randomly
assigned to groups' there shouldn't be any tendency for people who are alike in some way to
receive a particular treatment. In our example, randomly choosing which video game each kid
plays should eliminate any bias in assigning aggressive kids to one or the other of the games.
Third variables (Confounds)
Another explanation for differences in results among groups would be if something other than
the treatment varied among the groups. If the parents of the kids who watched Vice City kept
coming into the room and shaking their heads, muttering to themselves, and drumming their
fingers on the dresser and the Carmen Sandiego players were left alone, then the different
results for the groups might be due to parental behavior rather than to gameplay.
The number of different experiences that groups can have during an experiment is practically
limitless. Trying to eliminate these "third variables" is a major part of classic 'laboratory'
experiments. The (unobtainable) goal is to see that the only thing that is different for the groups
is the differing levels of the independent variable they are exposed to. In our example, we
would want everything the kids experienced to be exactly the same except for the game they
played. This is simply impossible. However, we can try to reduce as much as practical the
amount of variance among groups that is not due to the manipulation. If we were running a
short-duration experiment with a single manipulation, we could to come a lot closer to the
ideal. For example, if we were interested in the impact of background color on the emotional
impact of dramas, we could show one group of subjects a TV episode with a green background
and another group could watch the same episode with a red background. If the groups were
exposed in the same room, at similar times of day, with the same experimenter giving
instructions, the same outcome measures, etc. then there is little reason to expect that anything
other than the background color would explain differences in emotionality exhibited on the
outcome measure. Even in this scenario, however, it is not possible to eliminate all the subtle
differences that the subjects experience--and even minor differences can affect subjects in
important ways.
In order to limit the impact of third variables the experimenter exerts a number of forms of
control--means by which she reduces or eliminates variation in subjects' experience stemming
from sources other than the experimental manipulation. These actions are taken to isolate the
impact of the manipulation on the dependent variable. More specifically, the goal is to answer
the question, “How much change in the independent variable leads to how much (if any)
change in the dependent variable?”
Vocabulary:
Independent variable (IV): this is what the experimenter manipulates. Because it takes
different values, it is by definition a variable. The IV is usually considered the 'cause' in a
relationship. IV's of interest to telecommunications researchers can be a lot of things--the level
of violence in a video game, the features on an iPod, the amount of time necessary to load a
program, the physical attractiveness of a lead character, and on and on.
Factor: the operation representing one independent variable. It consists of all treatments
representing a single independent variable..
Treatment: a level of the factor a group in an experiment is exposed to. ‘Treatment groups’
are exposed to some level of the independent variable that is greater than zero. A control group
could be said to have either a) received no treatment or else b) to have received a zero-level
treatment—that is, the independent variable of interest is not present in the treatment a control
group receives.
Manipulation: the actions taken by the experimenter to present different treatments to the
subject groups. For example, the experimenter could expose two groups of subjects to different
episodes of a television show, could vary the volume of music different groups are exposed to,
could tell one group of subjects that the main character was very religious and tell another
group he was an atheist before exposing each to an action film, or any number of such
manipulations.
Dependent variable (DV): this is what the experimenter checks for change. It is the "effect"
variable. It is called the dependent variable because its level is 'dependent' upon the level of the
IV the subject is exposed to. Typical DVs studied in telecommunications research include level
of aggression, sales, satisfaction with a telecommunications service, time required to master a
new piece of technology, audience size, and so on.
Subjects: These are the people who are involved in the experiment and measured on the
dependent variable. Some are exposed to a non-zero level of the treatment ('experimental
groups'). Others may only be measured but not exposed, or may be exposed to a zero-level
treatment, serving as a 'control group.'
Control group: This group is measured on the DV, but is not exposed to a non-zero level of
the independent variable. This may be accomplished in two ways—the subjects in this group
may 1) be measured on the DV only, without exposure to the conditions in the lab, etc. or 2)
exposed to the conditions experienced by the treatment groups but with a zero-level treatment.
For example, to determine the impact of exposure to nudity in television programming a
treatment group may be shown a television program that included a significant amount of
nudity while a control group watched a television show of the same length but lacking nudity.
People often react to being involved in an experiment even if they are not exposed to any
experimental manipulation. For example, people who are brought into a room and talked to by
a woman in a white lab coat may answer questions about their political choices differently than
they would have had they never had that experience. If you want to determine the impact of a
political campaign commercial, you would want to remove the contamination of your estimate
that came from the effect of being brought into a room and directed to watch a commercial,
being asked questions by a woman in a white coat, etc. That is, you want to isolate the effect of
exposure to the commercial from the effect of everything else in the experiment. To do that,
you take one group through the experimental procedures but don't show them the ad ('control
group'), and another group goes through the experimental procedures and sees the ad
('treatment group' or 'experimental group'). The difference between these two groups on the DV
should reflect only the impact of the political commercial.
Experimental/treatment group: This is the group of subjects that is exposed to a non-zero
level of the independent variable. If there are multiple manipulations or multiple levels of
treatment, then there will be multiple treatment groups.
Random assignment: Often called 'randomization,' this is the process whereby subjects are
assigned to groups entirely by chance. One way to randomly assign your subjects would be to
flip a coin for each subject and assign all those who get heads to the experimental group while
all those who get tails are assigned to the control group. There are other ways to randomly
assign subjects to conditions, but the result should be that all subjects have an even chance of
ending up in any of the experimental or control groups you have.
NOTE VERY CAREFULLY: Random assignment is NOT the same thing as random sampling.
Most experiment subject pools are not at all a random sampling of the population you are trying
to study. Random assignment simply means that among the subjects you were able to recruit to
your experiment (your 'subject pool'), the group each subject is assigned to is a matter of
chance.
Control: all the methods used by the researcher to limit or eliminate the impact on the DV of
anything other than the IV. A variety of procedures are employed for this purpose. Laboratory
experiments control the environment under which the subjects are exposed to the manipulation
by blocking out external sounds, maintaining constant lighting, using the same people to
explain the procedures to different groups of subjects, using scripted greetings and instructions,
and on and on. Using a control group is a form of control. If subjects are measured on
characteristics thought to affect the DV (say, for example, on gender or on intensity of religious
belief) and then their score on the DV is adjusted using statistical methods, we call this
statistical control.
Experimental design
Although the discussion up to this point has been based upon the simplest forms of
experiments, there are a wide range of 'experimental designs' that may include multiple groups,
multiple treatment levels and/or controls, multiple waves of treatment, and multiple measures.
The various features allow the researcher to apply statistical controls, increase power and study
additional variables.
To represent various types of experimental designs, we use X's, O's and R's. The X represents a
treatment, the O represents an observation (measurement) and the R represents
randomization/random assignment.
Randomized after-only design:
R
R
X
O
O
The after-only design has several advantages. First, it is simple to carry out. That is, you only
need two groups and only need expose one of the two to your treatment. Because of the
random assignment of subjects to groups you can assume relatively equal groups at the outset
so that differences between the groups on the observation should be due to the treatment. On
the down side, you have no assurance that the groups were equivalent at the outset. If you were
unlucky pure chance could have landed subjects that were alike in ways that would affect their
score on the observation in the two groups. To strengthen the design, you could include
multiple treatment levels (X1, X2 and X3), as below:
R
R
R
R
X1 O
X2 O
X3 O
O
Randomized before-after design:
R O1 X
R O1
O2
O2
In the before-after design subjects are measured prior to (O1) and after (O2) exposure to the
treatment. Subjects in the control group are measured twice as well. This allows for an
estimate of how much change in the treatment group occurred, which is a big advantage of the
method, but it also introduces the possibility that exposure to the first observation will change
the effect of the treatment. For example, if we wanted to determine the effects of exposure to
beauty ads on self-image we could randomly assign subjects to groups, administer a
questionnaire about self-image to all the subjects, expose the treatment group to the ads and
then re-measure both groups. The problem would be that respondents who answered questions
about self-image prior to exposure might be more sensitive to the beauty ads than they would
have been otherwise. We would then have generated a different outcome than if we had not
sensitized our treatment group. The control group, likewise, might have been sensitized to the
topic and thought about it so that subsequent measures were affected.
A second advantage of having a before measure is that it can be used to identify variables that
might affect subject performance—by gender, age, psychological variables, and so on. Usually
this would be accomplished by using a different set of measures prior to the treatment than used
subsequently. If the measures are different, there would be less concern over priming the
subjects to certain topics or parts of the treatment but the advantage of comparing before and
after measures for change scores is lost.
Blocking
This is where you separate subjects into groups of similar people (say, blocking into male and
female groups) and then draw subjects for experimental and control groups separately from the
blocks. For example, you might split your sample by gender when studying the effects of video
game violence. Once you have one group of women and one of men, you could randomly
choose an equal number of women to join the experimental and the control groups. You would
repeat the process with the men so that in the end you would have the same male-female split in
the experimental group as in the control group.
The advantage is that by blocking you reduce the variance within the experimental and control
groups and thereby improve your statistical efficiency in testing for the impact of the
independent variable you are interested in.
Factorial designs
Factorial designs refer to those experiments where more than one independent variable are
included in the design. For example, the experimenter may manipulate both the sound volume
and the color brightness of a film clip to determine their impact on memorability of the content.
The design not only allows the researcher to test for the effect of sound volume alone and color
brightness alone, but to see if the effect of sound volume is different at different levels of color
brightness (this is known as an interaction). If there are two levels of sound volume and two
levels of brightness tested, then this would be considered a 2X2 factorial design. If there were
three levels of volume and four levels of brightness it would be a 3X4 factorial design. Note
that the number of actual treatments grows quickly--there would need to be 4 groups in the first
design and 12 separate groups in the second design (lowest level of volume paired with lowest
level of brightness, lowest level of volume paired with next lowest level of brightness, and so
on till each pairing has been completed). It is generally thought that a cell size (each pairing of
factor levels generates one cell in a table) should not fall below 5 subjects. Otherwise, the
reliability of results for the cell is too low.
Repeated measures
This is where you measure the same subjects multiple times. Often, the subject is exposed to
very different treatment levels and measured to see the effect of each one. You can treat
individual subjects as their own controls, which can be very efficient, allowing for the use of
fewer subjects. The main concern with this method is that the effect of an earlier treatment
could influence subject reaction to any new treatment, leading to results likely to generate
invalid conclusions.
Methodological critique of experiments
One of the main criticisms of the experimental method is its artificiality. To avoid
contamination of the results by third variables, experimenters create highly controlled
environments in which to carry out their studies. These studies are usually called 'laboratory
experiments' even if the highly controlled environments are not traditional 'laboratories.' These
controlled environments, however, are quite different from the ones we live and work in. For
example, when a subject is exposed to a TV show in the lab, the screen is of a certain size, the
seating arrangement is of a certain type, there are no kids running in and out of the room
screaming, etc. as one would likely find in many people's homes. The concern is that this very
artificiality leads to results on the DV that would not be found if the research were conducted in
the 'real world.' That is, while exposure to a Clint Eastwood flick in a laboratory setting could
encourage guys to confront an experimenter portrayed as cheating them out of compensation,
when they are outside the laboratory the movie might not have the same effect. Because we
don't spend very much time in laboratories, this represents a serious problem. One of the main
arguments against much of the research on exposure to violent television and aggressive
behavior is this very 'artificiality' of the experimental situation.
A second shortcoming of experiments, especially laboratory experiments, is that it is difficult
to study slow, gradual change in the dependent variable through this method. If change in the
dependent variable is inherently slow, it is not possible under most circumstances to keep
subjects in the study long enough to identify the ultimate effect of the independent variable.
When researchers' concern over generalizability of results is great enough to sacrifice the
advantages of the control available in laboratory experiments, they may manipulate an
independent variable in more natural environments and monitor the results on some criterion
measure. Experiments undertaken in more natural environments are often called naturalistic or
field experiments. When a marketer wants to test a new ad campaign or a cable company
wants to test the popularity of a new technology they will often 'test market' the new
campaign/service in a small portion of their geographic market in order to determine whether it
is worth investing large sums of money in the enterprise. Many such tests are carried out as
field experiments. While the company may lose a great deal of control by taking the research to
the field it gains greater generalizability of the results.
One example of field experiments is when anti-drug advertisements are aired in selected
markets (often called test communities or treatment communities) and surveys are carried out at
selected intervals to see whether changes in attitudes toward drug-taking have occurred. The
same surveys of drug-taking attitudes are carried out in communities that do not receive the
advertising, providing a 'control group' in such field studies. These are called "control
communities." Field experiments normally take place over longer time periods than do
laboratory experiments, and they are usually quite expensive.
Besides the expense, the loss of control that is a characteristic of field experiments is the main
downside of the method. Marketers find that their competition learns of the experiment and
changes its pricing or promotional methods, screwing up the results of the test market. During
the testing of an anti-drug campaign a political leader in a control community may mount an
aggressive campaign to teach kids to "Say No to Drugs." After all the time and expense it may
be difficult to determine whether any positive results are due to the experimental manipulation
or to something else that happened concurrently with the campaign. However, when the stakes
are high and the concern is whether something is likely to work in the natural environment,
field experiments may be well worth the expense.
So how do you construct and carry out an experiment?
Based on your knowledge of the topic, usually supported by a literature review, you determine
the concepts you feel are the most appropriate to your study, then carefully explicate them. You
must determine how you will manipulate your independent variable. The manipulation must be
valid and should also be efficient and effective (the actual manipulation should be strong
enough to produce an effect but within the realm of possible real-world conditions). You also
need to come up with at least one dependent measure, and often more than one. Your decisions
on the manipulation of the independent variable and the measurement of the dependent variable
should reflect your understanding of likely third variables as well as careful analysis of the
efficiency, reliability and validity of the measures and the manipulations.
As an example, Bandura's classic experiment on exposure to mediated violence and aggression
among children included a manipulation where some kids were exposed to videos of research
assistants kicking Bobo dolls, throwing them in the air, hitting them with play hammers, yelling
at them, etc. Others did not see these videos. The dependent variable was careful observation of
their behavior when left alone in a room with a number of toys, one of which was a Bobo doll.
The amount of time spent kicking, hitting, or otherwise carrying out 'aggression' toward the
doll was one of the measures. Bandura found a strong relationship between children's
'aggressive' behavior and exposure to the 'aggressive' video.
Critique of Bandura's experiment centered on both the artificiality of the situation (noted
earlier) and the validity of the DV. A number of those who questioned Bandura's research
argued that kicking or hitting a Bobo doll is not a valid measure of aggression. Bobo dolls are
meant to be kicked and hit, and they are not people. They are inanimate toys. Questions relating
to the manipulation (saw the video v. didn't see the video) also have been voiced.
More problems with experiments stem from poorly designed manipulations or questionable
dependent measures than from anything else. These are the crucial choices you make in
designing experiments. Should you fail to anticipate problems with either the manipulation or
the dependent measures, your experiment will likely be worthless or even worse--misleading.
Once you have an effective manipulation and a strong measure of the dependent variable, you
need subjects. Usually, subjects are recruited from available populations--like this class. That
is, much research is carried out on students in lower-level undergraduate social science courses
(like psychology, communications, political science, research methods, etc.). Sometimes
studies are carried out on other available populations--visitors to Hollywood studios, for
example. Rarely, researchers attempt to draw samples that more closely reflect general
populations. That usually requires a money payment or some other relatively expensive
incentive but it may well generate more externally valid results.
Once you have a subject pool, you must carry out the experiment. You schedule times and
places when the treatments and controls are run and assign subjects to these occasions. If
possible you should randomly assign the subjects to treatments, but if that is not possible you
should do your best to limit the impact of bias in assignment.
As always, you should strive to see that the actual conduct of the experiment follows as closely
as possible the proposed experiment. Any deviations from plan should be noted and may end
up being a potential explanation for obtained results.
Once you have collected your data (DV measures) you analyze them using the appropriate
statistics. If the results support a hypothesis you outlined earlier, you conclude that empirical
support for your theory exists. If you did not state hypotheses prior to collecting data, you
interpret the results according to their statistical implications.
Download