Non-Experimental Survey Designs Survey designs are one of the most popular non-experimental, quantitative designs. The general purpose of this design is to describe some important characteristic(s) of the population. This often involves describing affective characteristics such as attitudes, beliefs, behaviors, preferences, or values as well as more complicated traits such as self-esteem, efficacy, locus of control, or personality. Information is collected from subjects through the use of questionnaires, hence the term “survey” design. In the process of conducting a study using this design, quantitative data is collected, analyzed, and interpreted to provide clear descriptions of the desired characteristics being measured. Types of Survey Designs There are two types of survey designs: cross-sectional and longitudinal. The differences between these involve the time at which data is collected from subjects and the analysis of this data.. In a cross sectional design, subjects respond to a survey at a single point in time. For example, a researcher would use this design to collect information about teachers’ attitudes toward a new teacher evaluation system. It seems obvious that the data should be collected from all subjects at the same point in time so conclusions could be drawn about their collective attitudes. These conclusions involve comparing the results to an absolute standard describing the characteristic being measured (e.g., a very positive, favorable attitude; a negative, unfavorable attitude) or a relative standard such as the scores of other groups (e.g., men held more positive attitudes than did women, the attitudes of subjects in this study were significantly more positive than the attitudes of subjects in a national sample). A longitudinal survey design collects data from subjects in a single group over time. For example, such a design might survey teachers every year over a five year period of time. The analysis of data is focused on any changes over time for the members of the single group. In other words, the analysis attempts to identify any trends in the characteristic being examined over the time period of the study. Issues Related to Survey Designs There are five key issues related to survey designs (Oescher, 2012). These are a) sampling from a population, b) designing instruments for data collection, c) collecting data and obtaining high response rates, d) analyzing data, and e) interpreting the results from the analysis of the data. The first issue is common across all non-experimental designs. Typically, sampling methods are discussed in terms of a target population, accessible population, and sample. The former represents a group that has at least one characteristic that distinguishes them from other groups. In the field of education, students exemplify such a group, as do teachers and principals. Rarely do researchers have the resources to examine such general populations; they usually have access to some subset of these populations. For example, a researcher might wish to examine all students yet be able to collect data from only those students enrolled in one district. These students are known as the “accessible” population, since the researcher can likely obtain data from members of this population but not the larger target population. The third aspect of sampling involves selecting those subjects with whom the researcher will actually interact. This group is known as the sample; individuals are chosen based on one of several types of probability (e.g., random, stratified random) or non-probability (e.g., purposive, existing groups) sampling procedures. Thanks to the developments in using electronic data collection methods, researchers frequently use existing data bases to identify all members of an accessible population and distribute questionnaires to these individuals through e-mail or other similar electronic means. This sampling technique is known ironically as comprehensive sampling. Often the generalizability of the results is enhanced tremendously with this sampling technique. The second issue related to survey designs is the development of an instrument for data collection. Advice for constructing such instruments is readily available (e.g., McMillan and Schumacher (2011); Gay, Mills, and Airasian (2012); Creswell (2012); Anderson and Bourke (2000)). Of critical importance is the evidence of content and construct validity as well as -1- reliability for such instruments. Usually a pilot test of an instrument is quite beneficial as it produces data that allows one to reflect on the procedures that will be used to collect data, the directions for the instrument, the clarity and presentation of the survey, and most importantly the performance of the items. The third issue for survey designs is collecting data in ways that help ensure high response rates. This involves the design and presentation of the survey as well as the procedures used to administer it. In years past, the most surveys were administered by mail. This required obtaining accurate addresses for subjects in the sample, mailing them a copy of the survey, providing a self-addressed stamped envelope for returning the survey, and ensuring that all other matters (e.g., length of the survey, clarity of the items, ease of response) were as simple as possible. Once surveys were returned the responses for each subject had to be scanned or entered by hand into an electronic data set. Often the entire process was quite expensive and time consuming. Fortunately the use of electronic surveys has greatly facilitated data collection. Websites exist to allow researchers to easily write and format their surveys. Data bases with the e-mail addresses of accessible populations exist, and the process of delivering a survey to the members of these populations is quite easy and quick. Responses are made online and compiled by the website administrators, and these are returned to the researcher in easily read data files (e.g., Excel, SPSS). This method has greatly enhanced return rates by facilitating the delivery, completion, and return of any survey. The fourth issue is analyzing the data. This involves numerically coding the data and computing scores. Suppose a researcher has used a four point Likert response scale ranging from Strongly Disagree to Strongly Agree. These end points are typically coded as a 1 and a 4 respectively. The middle points of Disagree and Agree are coded as 2 and 3 respectively. Scores for subscales or total scales can be computed in many ways depending on how the researcher chooses to deal with missing data. One common way of doing so is to calculate the mean of the non-missing items on the survey. Thus, a subject answering10 items with responses of Agree to each item would have a mean score of 3.00 (i.e., 30/10). A subject answering only 9 of the 10 items with responses of Agree would have a mean score of 3.00 also (i.e., 27/9). There are nuances associated with such scoring approaches. One of the most important is the requirement of a minimum number of responses needed to calculate a score. A second is the assumption that the nature of the underlying scale being measured is continuous, not categorical. The interpretation of scores is the fifth issue associated with the use of a survey design. Interpretations of scores are complicated because of the scores are often confused with the interpretations of them. To exemplify this, consider an example of an item examining the value placed upon the use of a White Board when teaching. The item could be worded, “I believe using a White Board in my instruction is very important.” A response of Strongly Agree reflects a high value placed on the technology (i.e., the interpretation of the response), not a high level of agreement (i.e., the response). In addition to the interpretations associated with individual items, scores are often aggregated over many items. These aggregate scores are frequently interpreted as ranges of scores rather than individual responses. For example, an average of 10 items on a survey might be 3.40. What does this score mean? To answer this question the researcher must realize a four point Likert scale results in a range of responses of 3 (i.e., 4-1 = 3). To adjust this range to parallel the four point response scale, a single point is lost by either making the lower ends of the response scale one-half of a point or creating middle score ranges of one-half a point. An example of the first approach is a response scale of 1.00-1.49 (i.e., approximately one-half a point); 1.50-2.49; 2.50-3.49; and 3.50-4.00 (i.e., approximately one-half a point). An example of the second response scale is 1.00-1.99; 2.00-2.49 (i.e., approximately one-half a point); 2.50-2.99 (i.e., approximately one-half a point); 3.00-4.00. The score of 3.40 discussed above would be associated with a response of Agree on the first response scale (i.e., a positive, favorable attitude) but a response of Strongly Agree on the second response scale (i.e., a very positive, favorable attitude). The choice of a response scale is usually determined by the researcher based on the area of the underlying continuum in which they want variation in scores. That is, a researcher might choose the first response scale based on the desire to ensure the ends of the -2- response scale truly reflect more extreme interpretations of very positive or negative attitudes, while a researcher less concerned with the end points representing extreme scores would choose the second response scale. The interpretation of scores is further complicated by the type of comparisons made with the scores. A criterion-referenced interpretation examines a score relative to the underlying continuum of the characteristic being measured. For example, a score of 3.75 measuring the value associated with using a White Board when teaching might be compared to the neutral point of 3.00 on an underlying continuum of importance ranging from 1.00 to 5.00. Such a comparison represents an “absolute” interpretation of this score. On the other hand, a norm-referenced interpretation examines a score relative to the scores of some larger norming group that represents a large, representative population. The same score of 3.75 could be considered a very, very high score because it is in the 95th percentile for subjects in the norming group. These two perspectives on interpreting scores from surveys can complicate matters if the researcher is not very clear about the type of comparison being made. -3-