Hertog, Introduction to the experimental method

Experiments One of the most basic yet most powerful research methods is the experiment. Most researchers would agree that an experiment is the strongest test of a causal relationship. A number of physical sciences (for example, physics and chemistry) are heavily dependent upon the experimental method to develop and test theory. Use of experiments in social science is far more controversial, for reasons discussed below. Crucial features of the experiment are the manipulation of an independent variable by the researcher, the subsequent careful measurement of a dependent variable, and control over the conditions under which the manipulation and measurement occur. In essence, the experiment is a sophisticated form of "trying things out." That is, the experimenter is saying, essentially, "Let's try this and see what happens." The goal is to determine whether manipulating one thing (the 'independent variable') affects something else (the 'dependent variable'). Here's an example. Let's say I took two young boys and had one play "Where in the World is Carmen Sandiego" every day for two hours for a year. The other had (nearly) identical school experiences as the first except that he played "Vice City." After a year, I could have someone observe them on the school playground over a two-week period and mark down every act of aggression they engaged in. If, as predicted, our "Vice City" subject displayed more aggression, we might conclude that exposure to that video game leads to aggression among kids. In the example above, I manipulated exposure to violent video games (two hours per day of Carmen Sandiego v. Vice City), measured 'aggression' (observed and scored aggressive playground behavior) and controlled experience in school (though I could not control what happened at home). As you will see below, this example represents a 'field experiment' where the level of control is relatively low but the environment for the study is less artificial than that found in laboratory experiments. Now, before I hear a chorus of "wait a minute, there's a lot of other things that might have caused the difference" I will acknowledge that the experiment I just outlined has a ton of holes in it. That is, there are a number of weaknesses to the particular research method I just outlined. Clearly, a sample of two kids is inadequate. Even if I increased the sample size greatly, however, there would still be a chance that the difference in aggression was due to something other than the game-playing. Much of the art of constructing a good experiment is aimed at eliminating "third variable" explanations for the differences between subjects that get the different "treatments." A common explanation for finding differences among subjects after the experimental treatment is that they were different before the treatment. In our example above, that would mean that perhaps the kid that played "Vice City" was naturally more aggressive and exposure to the game had nothing to do with his behavior in the schoolyard. Frankly, if you are only going to use two kids in your experiment, you can't really eliminate that explanation. To reduce the likelihood that differences between groups on the dependent variable were the result of differences that existed before the manipulation, researchers increase the number of subjects. If a researcher wanted to carry out my experiment, she would randomly assign several kids to play each of the games and then compare the aggression exhibited by those who played one game and those who played the other. In most cases, that would mean comparing the average aggression score for the Carmen Sandiego players with the average score for the Vice City players. Although this doesn't eliminate the possibility that the Vice City kids were naturally more aggressive, it makes it far less likely. We can use statistics to determine how likely it is that differences among the subjects prior to the experimental treatment can explain the differences found after the treatment. Usually we say that we are determining how likely it is that the results occurred by chance. Common levels of chance that are accepted in the field of communication studies are 1 in 20 (95%) or 1 in 100 (99%). What that means is that if we found a difference between the two groups that would occur by chance only one time in 100, (for example), we would conclude that the difference we found was caused by the treatment and not due to differences existing prior to the experiment. To avoid having something other than chance decide which kinds of people end up in each of your groups, we 'randomly assign subjects to treatments.' That means we use some sort of random process to determine which group each subject is assigned to. A common method is going down the list of names and flipping a coin for each one, with heads putting the subject in the experimental group and tails putting him in the control group. If you let subjects choose for themselves, friends might decide to be in the same group, etc. When subjects are 'randomly assigned to groups' there shouldn't be any tendency for people who are alike in some way to receive a particular treatment. In our example, randomly choosing which video game each kid plays should eliminate any bias in assigning aggressive kids to one or the other of the games. Third variables (Confounds) Another explanation for differences in results among groups would be if something other than the treatment varied among the groups. If the parents of the kids who watched Vice City kept coming into the room and shaking their heads, muttering to themselves, and drumming their fingers on the dresser and the Carmen Sandiego players were left alone, then the different results for the groups might be due to parental behavior rather than to gameplay. The number of different experiences that groups can have during an experiment is practically limitless. Trying to eliminate these "third variables" is a major part of classic 'laboratory' experiments. The (unobtainable) goal is to see that the only thing that is different for the groups is the differing levels of the independent variable they are exposed to. In our example, we would want everything the kids experienced to be exactly the same except for the game they played. This is simply impossible. However, we can try to reduce as much as practical the amount of variance among groups that is not due to the manipulation. If we were running a short-duration experiment with a single manipulation, we could to come a lot closer to the ideal. For example, if we were interested in the impact of background color on the emotional impact of dramas, we could show one group of subjects a TV episode with a green background and another group could watch the same episode with a red background. If the groups were exposed in the same room, at similar times of day, with the same experimenter giving instructions, the same outcome measures, etc. then there is little reason to expect that anything other than the background color would explain differences in emotionality exhibited on the outcome measure. Even in this scenario, however, it is not possible to eliminate all the subtle differences that the subjects experience--and even minor differences can affect subjects in important ways. In order to limit the impact of third variables the experimenter exerts a number of forms of control--means by which she reduces or eliminates variation in subjects' experience stemming from sources other than the experimental manipulation. These actions are taken to isolate the impact of the manipulation on the dependent variable. More specifically, the goal is to answer the question, “How much change in the independent variable leads to how much (if any) change in the dependent variable?” Vocabulary: Independent variable (IV): this is what the experimenter manipulates. Because it takes different values, it is by definition a variable. The IV is usually considered the 'cause' in a relationship. IV's of interest to telecommunications researchers can be a lot of things--the level of violence in a video game, the features on an iPod, the amount of time necessary to load a program, the physical attractiveness of a lead character, and on and on. Factor: the operation representing one independent variable. It consists of all treatments representing a single independent variable.. Treatment: a level of the factor a group in an experiment is exposed to. ‘Treatment groups’ are exposed to some level of the independent variable that is greater than zero. A control group could be said to have either a) received no treatment or else b) to have received a zero-level treatment—that is, the independent variable of interest is not present in the treatment a control group receives. Manipulation: the actions taken by the experimenter to present different treatments to the subject groups. For example, the experimenter could expose two groups of subjects to different episodes of a television show, could vary the volume of music different groups are exposed to, could tell one group of subjects that the main character was very religious and tell another group he was an atheist before exposing each to an action film, or any number of such manipulations. Dependent variable (DV): this is what the experimenter checks for change. It is the "effect" variable. It is called the dependent variable because its level is 'dependent' upon the level of the IV the subject is exposed to. Typical DVs studied in telecommunications research include level of aggression, sales, satisfaction with a telecommunications service, time required to master a new piece of technology, audience size, and so on. Subjects: These are the people who are involved in the experiment and measured on the dependent variable. Some are exposed to a non-zero level of the treatment ('experimental groups'). Others may only be measured but not exposed, or may be exposed to a zero-level treatment, serving as a 'control group.' Control group: This group is measured on the DV, but is not exposed to a non-zero level of the independent variable. This may be accomplished in two ways—the subjects in this group may 1) be measured on the DV only, without exposure to the conditions in the lab, etc. or 2) exposed to the conditions experienced by the treatment groups but with a zero-level treatment. For example, to determine the impact of exposure to nudity in television programming a treatment group may be shown a television program that included a significant amount of nudity while a control group watched a television show of the same length but lacking nudity. People often react to being involved in an experiment even if they are not exposed to any experimental manipulation. For example, people who are brought into a room and talked to by a woman in a white lab coat may answer questions about their political choices differently than they would have had they never had that experience. If you want to determine the impact of a political campaign commercial, you would want to remove the contamination of your estimate that came from the effect of being brought into a room and directed to watch a commercial, being asked questions by a woman in a white coat, etc. That is, you want to isolate the effect of exposure to the commercial from the effect of everything else in the experiment. To do that, you take one group through the experimental procedures but don't show them the ad ('control group'), and another group goes through the experimental procedures and sees the ad ('treatment group' or 'experimental group'). The difference between these two groups on the DV should reflect only the impact of the political commercial. Experimental/treatment group: This is the group of subjects that is exposed to a non-zero level of the independent variable. If there are multiple manipulations or multiple levels of treatment, then there will be multiple treatment groups. Random assignment: Often called 'randomization,' this is the process whereby subjects are assigned to groups entirely by chance. One way to randomly assign your subjects would be to flip a coin for each subject and assign all those who get heads to the experimental group while all those who get tails are assigned to the control group. There are other ways to randomly assign subjects to conditions, but the result should be that all subjects have an even chance of ending up in any of the experimental or control groups you have. NOTE VERY CAREFULLY: Random assignment is NOT the same thing as random sampling. Most experiment subject pools are not at all a random sampling of the population you are trying to study. Random assignment simply means that among the subjects you were able to recruit to your experiment (your 'subject pool'), the group each subject is assigned to is a matter of chance. Control: all the methods used by the researcher to limit or eliminate the impact on the DV of anything other than the IV. A variety of procedures are employed for this purpose. Laboratory experiments control the environment under which the subjects are exposed to the manipulation by blocking out external sounds, maintaining constant lighting, using the same people to explain the procedures to different groups of subjects, using scripted greetings and instructions, and on and on. Using a control group is a form of control. If subjects are measured on characteristics thought to affect the DV (say, for example, on gender or on intensity of religious belief) and then their score on the DV is adjusted using statistical methods, we call this statistical control. Experimental design Although the discussion up to this point has been based upon the simplest forms of experiments, there are a wide range of 'experimental designs' that may include multiple groups, multiple treatment levels and/or controls, multiple waves of treatment, and multiple measures. The various features allow the researcher to apply statistical controls, increase power and study additional variables. To represent various types of experimental designs, we use X's, O's and R's. The X represents a treatment, the O represents an observation (measurement) and the R represents randomization/random assignment. Randomized after-only design: R R X O O The after-only design has several advantages. First, it is simple to carry out. That is, you only need two groups and only need expose one of the two to your treatment. Because of the random assignment of subjects to groups you can assume relatively equal groups at the outset so that differences between the groups on the observation should be due to the treatment. On the down side, you have no assurance that the groups were equivalent at the outset. If you were unlucky pure chance could have landed subjects that were alike in ways that would affect their score on the observation in the two groups. To strengthen the design, you could include multiple treatment levels (X1, X2 and X3), as below: R R R R X1 O X2 O X3 O O Randomized before-after design: R O1 X R O1 O2 O2 In the before-after design subjects are measured prior to (O1) and after (O2) exposure to the treatment. Subjects in the control group are measured twice as well. This allows for an estimate of how much change in the treatment group occurred, which is a big advantage of the method, but it also introduces the possibility that exposure to the first observation will change the effect of the treatment. For example, if we wanted to determine the effects of exposure to beauty ads on self-image we could randomly assign subjects to groups, administer a questionnaire about self-image to all the subjects, expose the treatment group to the ads and then re-measure both groups. The problem would be that respondents who answered questions about self-image prior to exposure might be more sensitive to the beauty ads than they would have been otherwise. We would then have generated a different outcome than if we had not sensitized our treatment group. The control group, likewise, might have been sensitized to the topic and thought about it so that subsequent measures were affected. A second advantage of having a before measure is that it can be used to identify variables that might affect subject performance—by gender, age, psychological variables, and so on. Usually this would be accomplished by using a different set of measures prior to the treatment than used subsequently. If the measures are different, there would be less concern over priming the subjects to certain topics or parts of the treatment but the advantage of comparing before and after measures for change scores is lost. Blocking This is where you separate subjects into groups of similar people (say, blocking into male and female groups) and then draw subjects for experimental and control groups separately from the blocks. For example, you might split your sample by gender when studying the effects of video game violence. Once you have one group of women and one of men, you could randomly choose an equal number of women to join the experimental and the control groups. You would repeat the process with the men so that in the end you would have the same male-female split in the experimental group as in the control group. The advantage is that by blocking you reduce the variance within the experimental and control groups and thereby improve your statistical efficiency in testing for the impact of the independent variable you are interested in. Factorial designs Factorial designs refer to those experiments where more than one independent variable are included in the design. For example, the experimenter may manipulate both the sound volume and the color brightness of a film clip to determine their impact on memorability of the content. The design not only allows the researcher to test for the effect of sound volume alone and color brightness alone, but to see if the effect of sound volume is different at different levels of color brightness (this is known as an interaction). If there are two levels of sound volume and two levels of brightness tested, then this would be considered a 2X2 factorial design. If there were three levels of volume and four levels of brightness it would be a 3X4 factorial design. Note that the number of actual treatments grows quickly--there would need to be 4 groups in the first design and 12 separate groups in the second design (lowest level of volume paired with lowest level of brightness, lowest level of volume paired with next lowest level of brightness, and so on till each pairing has been completed). It is generally thought that a cell size (each pairing of factor levels generates one cell in a table) should not fall below 5 subjects. Otherwise, the reliability of results for the cell is too low. Repeated measures This is where you measure the same subjects multiple times. Often, the subject is exposed to very different treatment levels and measured to see the effect of each one. You can treat individual subjects as their own controls, which can be very efficient, allowing for the use of fewer subjects. The main concern with this method is that the effect of an earlier treatment could influence subject reaction to any new treatment, leading to results likely to generate invalid conclusions. Methodological critique of experiments One of the main criticisms of the experimental method is its artificiality. To avoid contamination of the results by third variables, experimenters create highly controlled environments in which to carry out their studies. These studies are usually called 'laboratory experiments' even if the highly controlled environments are not traditional 'laboratories.' These controlled environments, however, are quite different from the ones we live and work in. For example, when a subject is exposed to a TV show in the lab, the screen is of a certain size, the seating arrangement is of a certain type, there are no kids running in and out of the room screaming, etc. as one would likely find in many people's homes. The concern is that this very artificiality leads to results on the DV that would not be found if the research were conducted in the 'real world.' That is, while exposure to a Clint Eastwood flick in a laboratory setting could encourage guys to confront an experimenter portrayed as cheating them out of compensation, when they are outside the laboratory the movie might not have the same effect. Because we don't spend very much time in laboratories, this represents a serious problem. One of the main arguments against much of the research on exposure to violent television and aggressive behavior is this very 'artificiality' of the experimental situation. A second shortcoming of experiments, especially laboratory experiments, is that it is difficult to study slow, gradual change in the dependent variable through this method. If change in the dependent variable is inherently slow, it is not possible under most circumstances to keep subjects in the study long enough to identify the ultimate effect of the independent variable. When researchers' concern over generalizability of results is great enough to sacrifice the advantages of the control available in laboratory experiments, they may manipulate an independent variable in more natural environments and monitor the results on some criterion measure. Experiments undertaken in more natural environments are often called naturalistic or field experiments. When a marketer wants to test a new ad campaign or a cable company wants to test the popularity of a new technology they will often 'test market' the new campaign/service in a small portion of their geographic market in order to determine whether it is worth investing large sums of money in the enterprise. Many such tests are carried out as field experiments. While the company may lose a great deal of control by taking the research to the field it gains greater generalizability of the results. One example of field experiments is when anti-drug advertisements are aired in selected markets (often called test communities or treatment communities) and surveys are carried out at selected intervals to see whether changes in attitudes toward drug-taking have occurred. The same surveys of drug-taking attitudes are carried out in communities that do not receive the advertising, providing a 'control group' in such field studies. These are called "control communities." Field experiments normally take place over longer time periods than do laboratory experiments, and they are usually quite expensive. Besides the expense, the loss of control that is a characteristic of field experiments is the main downside of the method. Marketers find that their competition learns of the experiment and changes its pricing or promotional methods, screwing up the results of the test market. During the testing of an anti-drug campaign a political leader in a control community may mount an aggressive campaign to teach kids to "Say No to Drugs." After all the time and expense it may be difficult to determine whether any positive results are due to the experimental manipulation or to something else that happened concurrently with the campaign. However, when the stakes are high and the concern is whether something is likely to work in the natural environment, field experiments may be well worth the expense. So how do you construct and carry out an experiment? Based on your knowledge of the topic, usually supported by a literature review, you determine the concepts you feel are the most appropriate to your study, then carefully explicate them. You must determine how you will manipulate your independent variable. The manipulation must be valid and should also be efficient and effective (the actual manipulation should be strong enough to produce an effect but within the realm of possible real-world conditions). You also need to come up with at least one dependent measure, and often more than one. Your decisions on the manipulation of the independent variable and the measurement of the dependent variable should reflect your understanding of likely third variables as well as careful analysis of the efficiency, reliability and validity of the measures and the manipulations. As an example, Bandura's classic experiment on exposure to mediated violence and aggression among children included a manipulation where some kids were exposed to videos of research assistants kicking Bobo dolls, throwing them in the air, hitting them with play hammers, yelling at them, etc. Others did not see these videos. The dependent variable was careful observation of their behavior when left alone in a room with a number of toys, one of which was a Bobo doll. The amount of time spent kicking, hitting, or otherwise carrying out 'aggression' toward the doll was one of the measures. Bandura found a strong relationship between children's 'aggressive' behavior and exposure to the 'aggressive' video. Critique of Bandura's experiment centered on both the artificiality of the situation (noted earlier) and the validity of the DV. A number of those who questioned Bandura's research argued that kicking or hitting a Bobo doll is not a valid measure of aggression. Bobo dolls are meant to be kicked and hit, and they are not people. They are inanimate toys. Questions relating to the manipulation (saw the video v. didn't see the video) also have been voiced. More problems with experiments stem from poorly designed manipulations or questionable dependent measures than from anything else. These are the crucial choices you make in designing experiments. Should you fail to anticipate problems with either the manipulation or the dependent measures, your experiment will likely be worthless or even worse--misleading. Once you have an effective manipulation and a strong measure of the dependent variable, you need subjects. Usually, subjects are recruited from available populations--like this class. That is, much research is carried out on students in lower-level undergraduate social science courses (like psychology, communications, political science, research methods, etc.). Sometimes studies are carried out on other available populations--visitors to Hollywood studios, for example. Rarely, researchers attempt to draw samples that more closely reflect general populations. That usually requires a money payment or some other relatively expensive incentive but it may well generate more externally valid results. Once you have a subject pool, you must carry out the experiment. You schedule times and places when the treatments and controls are run and assign subjects to these occasions. If possible you should randomly assign the subjects to treatments, but if that is not possible you should do your best to limit the impact of bias in assignment. As always, you should strive to see that the actual conduct of the experiment follows as closely as possible the proposed experiment. Any deviations from plan should be noted and may end up being a potential explanation for obtained results. Once you have collected your data (DV measures) you analyze them using the appropriate statistics. If the results support a hypothesis you outlined earlier, you conclude that empirical support for your theory exists. If you did not state hypotheses prior to collecting data, you interpret the results according to their statistical implications.

Hertog, Introduction to the experimental method

Related documents

Products

Support

Hertog, Introduction to the experimental method

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib