Non-Experimental Survey Designs Survey designs are one of the

advertisement
Non-Experimental Survey Designs
Survey designs are one of the most popular non-experimental, quantitative designs. The general
purpose of this design is to describe some important characteristic(s) of the population. This often
involves describing affective characteristics such as attitudes, beliefs, behaviors, preferences, or
values as well as more complicated traits such as self-esteem, efficacy, locus of control, or
personality. Information is collected from subjects through the use of questionnaires, hence the
term “survey” design. In the process of conducting a study using this design, quantitative data is
collected, analyzed, and interpreted to provide clear descriptions of the desired characteristics
being measured.
Types of Survey Designs
There are two types of survey designs: cross-sectional and longitudinal. The differences between
these involve the time at which data is collected from subjects and the analysis of this data.. In a
cross sectional design, subjects respond to a survey at a single point in time. For example, a
researcher would use this design to collect information about teachers’ attitudes toward a new
teacher evaluation system. It seems obvious that the data should be collected from all subjects at
the same point in time so conclusions could be drawn about their collective attitudes. These
conclusions involve comparing the results to an absolute standard describing the characteristic
being measured (e.g., a very positive, favorable attitude; a negative, unfavorable attitude) or a
relative standard such as the scores of other groups (e.g., men held more positive attitudes than
did women, the attitudes of subjects in this study were significantly more positive than the
attitudes of subjects in a national sample). A longitudinal survey design collects data from
subjects in a single group over time. For example, such a design might survey teachers every
year over a five year period of time. The analysis of data is focused on any changes over time for
the members of the single group. In other words, the analysis attempts to identify any trends in
the characteristic being examined over the time period of the study.
Issues Related to Survey Designs
There are five key issues related to survey designs (Oescher, 2012). These are a) sampling from
a population, b) designing instruments for data collection, c) collecting data and obtaining high
response rates, d) analyzing data, and e) interpreting the results from the analysis of the data.
The first issue is common across all non-experimental designs. Typically, sampling methods are
discussed in terms of a target population, accessible population, and sample. The former
represents a group that has at least one characteristic that distinguishes them from other groups.
In the field of education, students exemplify such a group, as do teachers and principals. Rarely
do researchers have the resources to examine such general populations; they usually have
access to some subset of these populations. For example, a researcher might wish to examine all
students yet be able to collect data from only those students enrolled in one district. These
students are known as the “accessible” population, since the researcher can likely obtain data
from members of this population but not the larger target population. The third aspect of sampling
involves selecting those subjects with whom the researcher will actually interact. This group is
known as the sample; individuals are chosen based on one of several types of probability (e.g.,
random, stratified random) or non-probability (e.g., purposive, existing groups) sampling
procedures. Thanks to the developments in using electronic data collection methods, researchers
frequently use existing data bases to identify all members of an accessible population and
distribute questionnaires to these individuals through e-mail or other similar electronic means.
This sampling technique is known ironically as comprehensive sampling. Often the
generalizability of the results is enhanced tremendously with this sampling technique.
The second issue related to survey designs is the development of an instrument for data
collection. Advice for constructing such instruments is readily available (e.g., McMillan and
Schumacher (2011); Gay, Mills, and Airasian (2012); Creswell (2012); Anderson and Bourke
(2000)). Of critical importance is the evidence of content and construct validity as well as
-1-
reliability for such instruments. Usually a pilot test of an instrument is quite beneficial as it
produces data that allows one to reflect on the procedures that will be used to collect data, the
directions for the instrument, the clarity and presentation of the survey, and most importantly the
performance of the items.
The third issue for survey designs is collecting data in ways that help ensure high response rates.
This involves the design and presentation of the survey as well as the procedures used to
administer it. In years past, the most surveys were administered by mail. This required obtaining
accurate addresses for subjects in the sample, mailing them a copy of the survey, providing a
self-addressed stamped envelope for returning the survey, and ensuring that all other matters
(e.g., length of the survey, clarity of the items, ease of response) were as simple as possible.
Once surveys were returned the responses for each subject had to be scanned or entered by
hand into an electronic data set. Often the entire process was quite expensive and time
consuming. Fortunately the use of electronic surveys has greatly facilitated data collection.
Websites exist to allow researchers to easily write and format their surveys. Data bases with the
e-mail addresses of accessible populations exist, and the process of delivering a survey to the
members of these populations is quite easy and quick. Responses are made online and compiled
by the website administrators, and these are returned to the researcher in easily read data files
(e.g., Excel, SPSS). This method has greatly enhanced return rates by facilitating the delivery,
completion, and return of any survey.
The fourth issue is analyzing the data. This involves numerically coding the data and computing
scores. Suppose a researcher has used a four point Likert response scale ranging from Strongly
Disagree to Strongly Agree. These end points are typically coded as a 1 and a 4 respectively.
The middle points of Disagree and Agree are coded as 2 and 3 respectively. Scores for
subscales or total scales can be computed in many ways depending on how the researcher
chooses to deal with missing data. One common way of doing so is to calculate the mean of the
non-missing items on the survey. Thus, a subject answering10 items with responses of Agree to
each item would have a mean score of 3.00 (i.e., 30/10). A subject answering only 9 of the 10
items with responses of Agree would have a mean score of 3.00 also (i.e., 27/9). There are
nuances associated with such scoring approaches. One of the most important is the requirement
of a minimum number of responses needed to calculate a score. A second is the assumption that
the nature of the underlying scale being measured is continuous, not categorical.
The interpretation of scores is the fifth issue associated with the use of a survey design.
Interpretations of scores are complicated because of the scores are often confused with the
interpretations of them. To exemplify this, consider an example of an item examining the value
placed upon the use of a White Board when teaching. The item could be worded, “I believe using
a White Board in my instruction is very important.” A response of Strongly Agree reflects a high
value placed on the technology (i.e., the interpretation of the response), not a high level of
agreement (i.e., the response). In addition to the interpretations associated with individual items,
scores are often aggregated over many items. These aggregate scores are frequently interpreted
as ranges of scores rather than individual responses. For example, an average of 10 items on a
survey might be 3.40. What does this score mean? To answer this question the researcher must
realize a four point Likert scale results in a range of responses of 3 (i.e., 4-1 = 3). To adjust this
range to parallel the four point response scale, a single point is lost by either making the lower
ends of the response scale one-half of a point or creating middle score ranges of one-half a point.
An example of the first approach is a response scale of 1.00-1.49 (i.e., approximately one-half a
point); 1.50-2.49; 2.50-3.49; and 3.50-4.00 (i.e., approximately one-half a point). An example of
the second response scale is 1.00-1.99; 2.00-2.49 (i.e., approximately one-half a point); 2.50-2.99
(i.e., approximately one-half a point); 3.00-4.00. The score of 3.40 discussed above would be
associated with a response of Agree on the first response scale (i.e., a positive, favorable
attitude) but a response of Strongly Agree on the second response scale (i.e., a very positive,
favorable attitude). The choice of a response scale is usually determined by the researcher based
on the area of the underlying continuum in which they want variation in scores. That is, a
researcher might choose the first response scale based on the desire to ensure the ends of the
-2-
response scale truly reflect more extreme interpretations of very positive or negative attitudes,
while a researcher less concerned with the end points representing extreme scores would choose
the second response scale.
The interpretation of scores is further complicated by the type of comparisons made with the
scores. A criterion-referenced interpretation examines a score relative to the underlying
continuum of the characteristic being measured. For example, a score of 3.75 measuring the
value associated with using a White Board when teaching might be compared to the neutral point
of 3.00 on an underlying continuum of importance ranging from 1.00 to 5.00. Such a comparison
represents an “absolute” interpretation of this score. On the other hand, a norm-referenced
interpretation examines a score relative to the scores of some larger norming group that
represents a large, representative population. The same score of 3.75 could be considered a
very, very high score because it is in the 95th percentile for subjects in the norming group. These
two perspectives on interpreting scores from surveys can complicate matters if the researcher is
not very clear about the type of comparison being made.
-3-
Download