Topic Material #4

advertisement
Designing the Study
This chapter includes information on the following topics:
Types of Experiments
Variables
Experimental Definitions
Design Types
Selecting Measures
Obstacles and Threats
Types of Experiments
Although the term experiment is used most of the time, not all research projects are true
experiments. For a project to qualify as an actual experiment, there must be an independent
variable that the experimenter is manipulating, and a dependent variable that is measured as the
changes in the independent variable occur. Studies that do not involve the manipulation of one
variable to study changes in another variable are referred to as quasi-experimental. Most of the
research that takes place at the undergraduate level is truly quasi-experimental. The most
common of this type of study is the distribution of questionnaire packets composed of several
scales or measures, and then the study of the outcomes of those measures relative to each other.
Although this type of study is not a true experiment, its contribution to the science and research
literature is valid. Quasi-experimental designs often are used to offer the basis of more complex,
truly experimental designs.
There really are a lot of benefits to quasi-experimental designs, particularly from the
viewpoint of an undergrad. First of all, a quasi-experiment is simpler. Everything about it is
simpler: research design, selecting measures, recruiting participants, administering the items,
coding and tracking responses, and data analysis. There are simply fewer major mistakes that
have to be avoided. This makes it a great way to get your feet wet, and a great place to start
while you get familiar with statistics and the writing aspect of research. Second, quasiexperiments allow for more independent work. A faculty advisor is much more likely to let an
undergrad work independently if there is going to be minimal risk to participants and a less
complicated research design, that if the student wants to bring people into a lab and manipulate
various aspects of the environment. This is not to say that a student is not capable of completing
a true experiment, or that no professor will be willing to advise such a project. Nonetheless, the
experience of completing a quasi-experiment before taking on a real experiment independently
provides a boost in ability that is invaluable in the execution of a true experiment.
Variables
In every research project, there are at least two, and sometimes three, types of variables
used. The three types of variables are independent, dependent, and organismic. The independent
variable is often referred to as the IV, and can include condition of assignment, time of day that
participants complete the study, or the order in which stimuli is presented to participants.
Independent variables are those that the experimenter manipulates intentionally. The dependent
variable, also known as the DV, is the variable that is being measured. Most research involves a
manipulation of the IV in order to study changes in the DV. The third type of variable is
organismic or subject variables. These variables are similar to the independent variables, except
that the experimenter cannot manipulate them. Organismic variables can be used to divide
participants into groups to allow for comparison of a DV relative to the organismic variable
instead of an IV. Gender, age, eye color, height, and weight are all examples of organismic, or
subject, variables.
In addition to being familiar with the various types of variables, it is necessary to
understand the levels of measurement of the variables you are using. There are three levels of
measurement: nominal, ordinal, and scale. Nominal variables are those measured in terms of
categories of membership; none are necessarily better than the others, they are just different
categories. For instance, gender is a nominal variable: female is not greater than or less than
male, it is just a different category. Other nominal variables include political and religious
affiliations, race or ethnicity, and family composition.
Ordinal variables are similar to nominal variables in that they are measured in terms of
membership. The difference between nominal and ordinal variables is that ordinal indicates a
specific order, or a ranking of the classifications. The categories freshman, sophomore, junior,
and senior are all categories of membership. Because they can be ordered in terms of least
advanced to most advanced, they constitute an ordinal variable instead of a nominal one. Military
rank is another ordinal variable: each military rank is a category (so it seems to be nominal), but
there is a certain order to them, which makes them ordinal. The key feature of ordinal variables
is that they indicate a ranked order.
The third type of variable measurement is scale, which can be broken down further into
interval and ratio. Interval scale indicates that the magnitude of something is being measured,
that the spaces between any two sets of consecutive points are equal, but there is no true zero
point. Temperature is an example of ordinal measure: there is no absolute zero, which we
experience when the winter temperatures dip below zero. Ratio scale is also a measure of
quantity or magnitude, has equal distances between points, but does have an absolute zero point.
Age, number of course credits complete, dollar amount of annual income, number of siblings or
children are all ratio scale variables.
A third classification of variables is whether it is continuous or discontinuous.
Continuous implies a scale along which an individual can fall at any point. For instance, if you
measure the height of participants in inches, and do not round to the nearest inch, it would be
possible to have any number of inches or fractions of inches imaginable. This is a continuous
measure. A discontinuous variable implies that within the range of possible values, an individual
cannot fall into any spot between points. Nominal and ordinal data are both types of
discontinuous variables. Calculating the age of participants in whole days is an example of a
scale variable that is also discontinuous: if you are measuring age in whole days, it is not
possible to have half a day or an eighth of a day. This inability to fall between measurement
points makes it discontinuous.
Experimental Definitions
In the English language, there are several words that have multiple meanings.
Unfortunately, the same problem exists in research. To ensure that your audience understands
exactly what you are talking about, you must provide definitions of your constructs and
variables. Constructs are the major concepts you are looking at. Even though it might seem
obvious that everyone in the world is familiar with a concept, you must define it in your study.
You do not have to define things so simple that the average elementary school student knows
what they are, like age, gender, race, or religious affiliation. You do need to define everything
else in terms of how it is relevant to your study.
Independent and dependent variables also must be defined for your audience. The
definition of an independent variable is the experimental operational definition, and identifies
the exact manipulation of the independent variable that occurred. The operational definition is
used to define the dependent variable in terms of what it is (according to you), how it is being
observed, and how it is being measured. Writing good definitions can be more difficult than it
sounds. The purpose of defining variables is to convey to the audience the precise way you are
defining, manipulating, and measuring the variables under consideration.
Design Types
There are three types of designs that cover most experimental designs: within-subjects,
between-subjects, and mixed. Each of these names indicates the way that comparisons and
analyses take place. Within-subjects studies rely on one group of participants to complete all the
measures, and comparisons are made between those measures for the same group of subjects.
Correlational studies are often within-subjects designs: all participants might complete the same
surveys, and then correlations between responses to those measures are calculated to identify a
relationship between variables. This type of design might be used to determine that there is a
relationship between two variables, say yearly income and open-mindedness. You might be able
to identify a relationship between these two variables, but keep in mind that you cannot
determine a cause-and-effect relationship, just that a relationship exists. You do not know for
sure whether having more money causes the participants to be more or less open-minded, or if
how open-minded an individual is effects how much money they will be able to earn. This is the
problem of bidirectionality – not knowing whether one variable causes the other, or if they
merely coexist. A final consideration is the effect of a third variable. Suppose that the previous
example is a real study, and you have identified a relationship between yearly income and openmindedness. It might be that the two of these are not as related as you think – maybe they are
both closely related to a third variable, such as level of education or age, that causes them to
appear related even when they really are not.
When considering relationships, it is also important to understand the direction of the
relationship. Correlations can be positive or negative. A positive relationship is one in which as
one variable increases, so does the other. Income level increasing as age increases is a positive
relationship. Negative relationships are those in which, as one variable goes up or down, the
other variable moves in the opposite direction. If you find that creative ability decreases as age
increases, you have identified a negative relationship.
Between-subjects designs, sometimes referred to as group comparison designs, are
studies that compare responses between two groups of participants. If your research design
includes the assignment of participants to different conditions so you can compare the responses
between the groups, you have a between-subjects design.
Mixed designs are studies designed to compare responses between groups as well as
within the group. Suppose you have three conditions and participants are randomly assigned to
one of the conditions. If all of your participants complete the same measures and you look for
relationships within those constructs measured, you are using within-subjects design elements. If,
in the same study, you look at differences between two or more groups, say between men and
women, you are using between-subjects comparisons. This means the design is mixed.
Regardless of the data you collect, there is always some way to evaluate it through both betweensubjects and within-subjects comparisons. This does not mean that every study has a mixed
design. The classification of your study as between-subjects, within-subjects, or mixed should be
based on your hypotheses. Let you primary objectives guide the development of your
experimental design. It is usually more satisfying to design your study to fit your hypothesis,
than to alter your hypothesis to fit the design you have developed.
Selecting Measures
When it comes to selecting the measures to use in a research project, you have two basic
choices. You can either use a measure that has been developed and statistically supported by
another researcher, or you can attempt to create your own measure. There are a lot of advantages
to using someone else’s measure: it already exists, so all you have to do is copy it, score it, and
cite it. In general, measures that have made it into the field’s top journals have been through
extensive statistical procedures to verify that they are good measures. So why would you choose
to not use someone else’s measure? Just because someone meant to create a survey that would
measure a particular construct does not mean they succeeded at doing so. It might also be that
there is something about their sample that made the measure work, but it might not be as useful
to you. In most situations, you would not know this until you have collected the data.
Creating your own measure, however glamorous it might sound, is a lot of work. If you
want to create a questionnaire, you have to come up with a list of questions that might be
relevant to what you want the survey to measure. Once your data is collected, you must analyze
the survey to determine if there is anything good in it. It is common to find that a survey meant
to measure one thing really measures several different things, called factors. This is good, as it
usually means the scale contains relevant subscales, but trying to figure out what those factors
might be can be time consuming and tricky. When coming up with a list of questions to include
in your measure, there is some controversy over how many questions you have to start with.
Some researchers believe strongly that you must have a couple hundred questions to have any
prayer at a decent measure. Other researchers, however, figure that if you can come up with 20
questions, and what you get is a solid measure or a couple distinct factors, that’s all you need. If
you are new at research and are thinking about creating your own measure, talk to your advisor.
If they are going to be helping you through the process, it isn’t a bad idea to do it their way the
first time around.
In general, it is less work to use someone else’s measure. The amount of time you have to
complete the project, your confidence in your ability to learn more complicated analysis
procedures, and not finding a measure you are really satisfied with are the key factors that
influence whether or not researchers try to create new measures of their own. If you decide to
create your own measure, be patient with yourself. You are not likely to produce a wonderful,
amazing, ground-breaking questionnaire the first time around, but you might surprise yourself
with something that actually works.
When you are deciding between measures to figure out which ones best fit your study,
consider how feasible they are. If you want to measure many different constructs in one study, it
might be better to select shorter measures; if you are looking at just one or two specific concepts,
length is not quite as important. Also, be aware that while many measures are available free of
charge, there are some that you do have to pay for. Additionally, just because an article reports a
new measure that sounds just perfect for you does not mean that the measure itself or a scoring
key is available in print. You might need to contact the author and ask for a copy and permission
to use it for your study. If you are on a tight time schedule, this may not be feasible. Choose
measures that work best for the situation in which you find yourself as you attempt to conduct
each study. If you find a measure you just have to try but it is not feasible for your present
research, get a copy and file it away. You always have the option to work it into a future project.
There are as many different scales used by measures, as there are measures. The most
common type of scale used for personality or behavior assessment is called a Likert scale. A
Likert scale usually has two endpoints with points between, and participants are asked to rank
their agreement or disagreement with a statement by indicating where on the scale it falls. Likert
scales can have either an even or an odd number of values. With an odd number, there is a
midpoint; with an even number of values, there is no midpoint. Controversy about which method
is better is ongoing. At this point, it really comes down to personal preference. If you are using a
scale developed by someone else, use the same scale they used. If you change the scale, you have
changed the measure, and then you really have no results comparable to those previously found
using that measure. However, if you are creating your own measure, the choice of whether to
have an odd number or even number of points is up to you. The major arguments for having an
even set of points are as follows: when you are asking questions that are highly emotional or
might make a participant feel guilty for their honest response, having a midpoint on the scale
provides a safe “neutral” zone where participants can sit and not have to actually respond to the
items. If there is an even number of points of the scale, the participant must choose a side of the
issue. Even if they select one of the two middle numbers on the scale, they have chosen a side. It
is theorized that this method leads to more honest answers on controversial topics.
The argument in support of using a Likert scale with an odd number of points is just as
compelling, however. When a midpoint is available, it does not force participants to take sides on
an issue they honestly have not ever considered. There are topics out there that are not relevant to
everyone, and a midpoint allows people to express this. As a basic rule of thumb, if you want to
ask questions that are not guilt or emotion provoking, go ahead and use a midpoint. But if you
allow a neutral zone in the midst of guilt or emotion provoking items, you might find that none
or your participants have feelings on controversial issue. A few examples of topics that are more
likely to provoke guilt or emotional reactions include prejudice (toward any group), stereotypes
(again, of any group), political agendas, and self-report of aggression.
With all studies, researchers must be sensitive to the possibility of socially desirable
responding and the extent to which their topic might encourage it to happen. Socially desirable
responding is the tendency of participants to report what they think is more socially accepted or
expected of them. This is a real concern when addressing sensitive or volatile issues. The more
controversial an issue is, the more likely participants are to monitor their responses. A tricky
thing about socially desirable responding is that it does not always occur because of intent.
Often, participants are not really conscious of the fact that they are doing it. When you are
measuring topics that might encourage participants to provide socially desirable responses,
include one of the many measures designed to detect the degree to which the participant is
responding desirably. There are a number of social desirability scales available for free, and they
range in complexity from being very short and simple to being more complex and measuring
whether desirable responding is occurring consciously or subconsciously. If you decide that your
study merits the use of a desirable responding measure, include it in the battery of questionnaires
so that it follows immediately after the most provocative measure, as this is where desirable
responding is the most likely to be an issue worth considering.
Regardless of what measures you decide to use, or what topic you are studying, you need
to be aware of the different ways that a survey measure can be evaluated. Namely, you should
understand the reliability and validity of the scale. A measure that is reliable can be depended on
to produce consistent results. A common way to evaluate reliability is to look at what is called
test-retest reliability. If a measure is reliable by the test-retest method, a group of participants
should be able to complete the same survey at two different times and yield very similar results.
Instruments that meet this requirement are often thought of as measuring traits; they measure a
trait that is largely immune to circumstance and other factors, and remains fairly constant most of
the time. The other type of reliability to know about is Interitem reliability. Interitem reliability
measures the extent to which items intended to measure the same concept actually do so. This
type of reliability is often examined using correlations and Cronbach’s alpha.
Validity is a little more complicated than reliability, but only because there are numerous
types of validity. Validity is the extent to which a measure actually measures what we intend for
it to measure. Validity can be either internal (knowing for a fact that the changes observed were
the result of the independent variable), or external (how well the findings can be generalized to
other sample groups and situations). Other types of validity to consider when evaluating possible
measures include concurrent validity, content validity, face validity, and predictive validity.
Concurrent validity is how high the correlation is between the scores on the measure you are
examining, compared to scores earned on measures known and trusted to measure the same
concept. In a sense, concurrent validity is a measure of how one scale compares to another.
Content validity is how well the items included in the measure actually measure the different
elements of what it is designed to measure. Suicide is a very complex concept to study. A
measure that only gathers information about how a person would commit suicide if they were to
attempt suicide leaves out a lot of other elements. Such a measure should also include items
intended to assess desire to commit suicide, how often an individual contemplates suicide, and
many other issues relevant to the topic of suicide. In short, for a measure to be high in content
validity, it must measure as narrow or as broad a range of concepts as exist in what it is
attempting to measure. Face validity is how obvious a method of manipulation or measurement
is. If you use a yardstick to measure a football field, your method of measurement is very
obvious, and therefore has high face validity. Predictive validity, like face validity, is fairly
simple. Predictive validity is the degree to which the measure predicts behavior. This is of
special concern with some topics, as what people report doing or having done is not necessarily
what happened. There are so many reasons why a person might misrepresent their behaviors or
intentions, that it is not realistic to attempt to evaluate all of the possible motivations. Measures
that are high in predictive validity are those that have managed to bridge the gap between what
people say they do, and what the actually do.
Design Obstacles and Threats
There are a number of problems that can threaten the validity of a study. To produce
valid results, researchers must check their designs for these obstacles and threats before
conducting the study. Research with a lot of validity issues is of little or no use in most instances.
Some of the most common threats to validity are confounding, order effects, maturation,
mortality, and participant history. Confounding is when a relationship that is found between two
variables is invalid because the change in the dependent variable might actually be due to a
variable other than the manipulated independent variable. Relationships exist everywhere,
especially research; one goal of the researcher is to explain relationships, and this cannot be done
if the relationship might be due to several different variables.
Order effects are changes in response due to the order in which stimuli is presented.
Suppose an experimenter wanted to measure hostility and sexism, was using a negative scenario
in the measurement of sexism, and always measured hostility second. The responses obtained on
the measure of hostility are likely to have been influenced by the scenario presented with the
sexism measure. This is an order effect: things that effect the responses of participants that are
based on the order in which they are presented. The easiest way to account for this issue is to
counterbalance stimuli. This means randomizing the order in which the stimuli is presented. In
the example above, half of the participants would complete the measures with the sexism
measure first, and the other half would complete the hostility measure first. For true
counterbalancing to occur, the participants must be randomly selected to one of the random order
conditions. The more measures being used, the greater the number of random orders that are
possible.
History is the threat that is created by the individual experiences of each participant.
Someone who wants to study prejudice toward a minority group is going to get very different
results if that group has recently been accused of terrorist acts against the participant’s country
than if the targeted minority group has never been attributed with attempting to injure the
participant’s group. An experimenter who wants to evaluate the math abilities of college students
would have a history threat to worry about if the participant group were all third- or fourth- year
math majors. Testing threat is a type of history threat in which a participant’s having completed
the same or another survey in the past influences how they respond to the current one.
Maturation is one threat to internal validity that we cannot stop, but is also one we can
easily work around. Maturation is the change in observed behavior (or survey response) that
comes as a result of psychological or physical changes that the participant experiences during the
experiment. Unless you are asking participants to be involved for only a few minutes to commit
really low-stress tasks, this is something to be concerned with. In most research that students
conduct, this can be virtually eliminated by taking two precautionary steps. The first thing we
can do to help avoid maturation is to carefully select measures that get the information we want
without gathering a lot of miscellaneous information. This means participation will take less time
and allow fewer chances for changes in the participant’s physical state. The second measure we
can take to help curb maturation is counterbalancing; presenting the stimuli in random orders
when possible will help to balance where psychological maturation during participation would
otherwise skew results.
Subject mortality is the rate at which participants drop out of different conditions of a
study. Inconsistent subject mortality across conditions is a threat to the internal validity of the
design. There are several things you can do to help prevent subject mortality from becoming a
problem. First, design all conditions to take approximately the same amount of time and effort,
and to be relatively equal in the level of stress they induce. The more similar these factors, the
more similar the dropout rate will be.
Download