Sample Design: Part 2 Slide 1 The goal of this lecture on sample design is to introduce you to basic sampling terminology, to discuss the various stages of selecting a sample, to introduce you to two basic types of samples; probability and non-probability samples and explain each of the different types of those samples and when you might use them, to briefly introduce you to the notion of sampling error, and to close by discussing Internet and panel samples. Slide 2 Now I’ll describe four different types of probability samples: simple random samples, systematic samples, stratified samples, and cluster samples. I’ll talk about Internet samples and panel samples later. Slide 3 A simple random sample is the one you’re familiar with from stat courses. Each element within the population has an equal chance of selection. For a probability sample, the probability of the element or the population being selected is known. Knowing selection probabilities doesn’t make them equal; it just means they’re known. With a simple random sample, they’re known and equal. Slide 4 The advantage of a simple random sample is that such a sample allows the strongest inferences to the general population. From a scientific or statistical standpoint, a simple random sample is most preferred, but more expensive to collect. A systematic sample is an easy and inexpensive way to draw the operational equivalent of a simple random sample. All that’s needed for a scientific sample is a sample frame that lists all population members. In contrast to probability samples, non-probability samples aren’t drawn from an exhaustive list of population members. Assume a sample of 100 persons must be drawn from a list of 1000 people. To draw a systematic sample, it’s necessary to find a random starting point between 1 and 10 and then select every tenth name from that starting point. For example, if the random point is the 7th name, the 7th, 17th, 27th, et cetera persons are selected from the list. Drawing names from a list systematically is an easy way to generate a random sample. This approach is useful when using a telephone directory as the sampling frame. (I’ll discuss problems with using directories ‘as is’ later.) Here, phone numbers rather than people comprise the sample frame. For example, if a local sample of 400 people is needed, and the local residential telephone directory contains 200 pages and two columns of names and phone numbers per page, then the process is very straightforward. First, cut a piece of paper to a random length shorter than the length of a directory page. Then, align that piece of paper with the top of the page for each column. Select the first number in each column not blocked by the paper and add ‘1’ to it (so 646-5238 becomes 646-5239). Adding ‘1’ to a listed telephone number is the process most likely to create a working but operationally random phone number. Page | 1 Slide 5 A quota sample is non-scientific because it lacks a sample frame. To fill a quota, interviewers screen people based on the key characteristic(s). Once enough people with that/those characteristic(s) have participated, no more like them are solicited. A stratified sample accomplishes the same thing as a quota sample, except it works from a sample frame. For example, to stratify respondents into low, medium and high income groups, a researcher would first construct a sample frame for each income group and then randomly select people from each frame. That process would guarantee a sufficient number of respondents from each group—like a quota sample—but it would create a scientific sample because respondents in each stratum would be selected from a sample frame. Don’t confuse stratified samples with quota samples. The key difference is the use of a sample frame to select members or elements from each stratum. Slide 6 The next four slides show a simple example of drawing a stratified sample. The population is 16 households. Clearly, a census rather than a sample would be preferred here to avoid the likely random sampling error caused by too small a sample. However, this is a numeric example intended to illustrate a point. This slide shows the furniture expenditures in the last year of 16 households, which ranged from $60 for household #1 to $4500 for household #13. Total expenditures for all 16 households were $23,510, the mean per household is $1469.38, and the standard deviation is $1344.38. (Thus, roughly 2/3rds of respondents spent $1469.38 +/$1344.38.) Slide 7 This slide show five randomly drawn households from that universe of 16 households. This sample of households spent a total of $8,020 and a mean $1604. To estimate total sales for the population, it’s reasonable to multiply the mean of $1604 by the 16 households, which is $25,664. Although that estimate is close to the actual total from the previous slide ($23,510), it’s still off by a few thousand dollars. Is there a way to generate a more accurate estimate? The answer, ‘yes’, is demonstrated on the next two slides. Slide 8 Instead of merely sampling at random, we might apply some marketing theory to our sampling process. Marketing theory tells us that different households, at different stages of the family life cycle, tend to have different furniture needs. It’s likely that singles buy a little furniture, newlyweds buy more furniture (especially if they just bought a home and need to furnish it), and households with children buy less furniture (because they must cover numerous competing product needs and might want to avoid child-induced damage). On this life-cycle basis, the 16 households can be divided into the four strata: Stratum A contains single adults, Stratum B contains married couples without children, Stratum C contains married couples with a home but no children, and Stratum D shows married couples with children living at home. Now each stratum can be sampled. Slide 9 This slide shows the two households selected from each stratum. The mean expenditure is $150 for Stratum A, $775 for Stratum B, $3600 for Stratum C, and $1300 for Stratum D. The calculation at the bottom of the slide shows the relative incidence of these strata: 3/16th are single adults, 4/16th are married couples without children, 3/16th are married couples with home Page | 2 and no children, and 6/16th are married couples with a home and children. Weighting the mean expenditures for each strata by the relative incidence of each strata creates an estimated total furniture expenditure last year of $22,150. This weighted estimate is closer than an unweighted estimate ($25,664) to actual total expenditures ($23.510). As per this example, stratified samples can yield more accurate population estimates. Slide 10 Now that I’ve shown how stratified estimates can produce more accurate population estimates, I’ll talk briefly about proportionate and disproportionate stratified sampling. Collecting data is expensive, so researchers should try to maximize the efficacy of each dollar spent on data collection. In some situations, using a disproportionate sampling procedure will achieve this goal. This slide illustrates the difficulty of forecasting next year’s sales from a sample of customers who buy different quantities. Randomly including or excluding the largest customer would change the forecast markedly. If a small subset of customers is responsible for most sales, then a relatively larger percent of that subset and a relatively smaller percent of remaining customers should be surveyed to produce the most accurate forecast. Slide 11 Perhaps this retailing example from A.C. Nielsen will reinforce that point. Here, the different types of retailers are chain stores, large independent stores, medium independent stores, and small independent stores. In the universe of retailers, 25% are chain stores, 13% are large independent stores, 32.6% are medium independent stores, and roughly 30% are small independent stores. However, sales volume differs vastly from propensity in the population: the chains represent roughly 25% of stores but 50% of dollar sales; in contrast, the small independents represent almost 30% of stores but only 10% of dollar sales. For more accurate estimates of sales, a researcher would want a sample comprised of more than 25% chain stores but less than 30% independent stores. In other words, the researcher would oversample chain stores and undersample small independent stores. Slide 12 The single purpose of a cluster sample is to sample economically while retaining the characteristics of a probability sample. Consider conducting personal interviews with a geographically dispersed sample. The cost of interviewers reaching respondents would be cost prohibitive. If researchers can identify homogeneous clusters of people and then select people within each cluster randomly, then respondents would be more geographically proximate and interviewing costs would drop markedly. Unlike market segmentation, in which heterogeneous groupings are desired, similar groupings are desired for cluster analysis. If groups differ from one another, then study results will depend on the clusters selected. Fortunately, a probability sample of a probability sample is a probability sample. Cluster sampling is a multi-stage sampling approach; after clusters are randomly selected, elements within each cluster are randomly selected. Slide 13 Here’s an example within a single metropolitan statistical area. First, the researcher randomly selects an area, then a city, then a census track, then a block area, and then a sample of people who reside on that block. The researcher might randomly select several blocks, then select Page | 3 people from each block, creating a scientific sample of people in this city. Again, the key issue in generating a representative cluster sample is that each cluster must be as similar to the others as possible; hence, results will be independent of the cluster(s) chosen. Slide 14 Here’s another illustration of a multi-stage approach. In this case, it’s a national sample. Slide 15 The next two slides show examples of populations and clusters. If the target population is the U.S. adult population, then clusters could be created at the state level, the county level, the SMSA level, the census track block level, or the household level. If the target population is college seniors, then clusters could be colleges. If the target population is manufacturers, then clusters could be created at the county level, the SMSA level, or even the plant level. Airline travelers could be clustered by airports or even by flights. Sports fans could be clustered by the stadia that they attend. Slide 16 Remember the goal of cluster analysis is single fold: to reduce the cost of collecting scientific or probability samples. Slide 17 This slide provides an excellent summary of the relative strengths and weaknesses of the various sampling techniques. I urge you to examine this slide carefully! Slide 18 The next stage in the selection of a sample is a basis for choosing a given sample design. Some important considerations are as follows: The needed degree of accuracy. Clearly, non-probability samples will tend to be less representative of the population, so estimates based on non-probability samples will tend to be less accurate. Available resources. Non-probability samples are much cheaper, random samples are more expensive than cluster samples, and stratified samples may make better use of data collection dollar. Timeliness. Non-probability samples can be collected faster. Simple random samples take a long time because of the time required to generate the sampling frame. Using a snowball approach may be more preferred if time is an issue. Advanced knowledge of the population; in particular, the variability of different population subgroups. To use data collection dollars most efficiently, researchers should query relatively more respondents from more variable subgroups and relatively fewer respondents from less variable subgroups. If all elements in a subgroup are identical, then an accurate population estimate requires sampling only one element. Think about bags of cookies. How many cookies from a bag of chocolate chip cookies need to be sampled to determine if the bag contains fresh cookies? Only one; they come from the same bag, the same manufacturer, and they are all chocolate chip. Alternatively, how Page | 4 many cookies from a bag of mixed cookies need to be sampled to determine if the bag contains fresh cookies? If the bag contained many types of cookies, such as Nilla Wafers, Ginger Snaps, and Mallimars, then it’d be necessary to sample many cookies to determine if the bag contains fresh cookies. Hence, knowing the variability of subgroups in the population would help to optimize data collection expenditures because more variable subgroups would be oversampled and less variable subgroups would be undersampled. Sample scope. Deciding whether to draw a local or national sample also influences the type of sample design. If it’s local, then cluster sampling won’t be cost effective. If it’s a national sample, then cluster sampling may be critical. Statistical analysis requirements. Statistical analyses are only appropriate for probability samples because those are the only samples that can be projected comfortably onto the larger population. Non-probability samples are inappropriate in that regard. Slide 19 After deciding on the sample frame, then the sample size must be determined and the sample collected. Slide 20 (No Audio) Slide 21 Sampling error should be minimized. It may be necessary to incur some error if there’s a costto-benefit tradeoff. Here are the three types of sampling errors: sample frame error, random sampling error, and non-response error. Slide 22 This diagram shows how errors can compound in drawing a sample. There’s a total population of interest and to draw a scientific sample requires creating a sample frame. Although all members of the population should be included, some members will be excluded from the frame. Frame error is the error introduced by systematically excluding certain members of the total population. Once a sample frame has been created, drawing a random or systematic frame is relatively straightforward. Non-response error is caused when some selected people/elements fail to participate in the study. There’s frame error for selecting some but not all members of the population, uncontrollable random sampling error (the luck of the draw), and non-response error, which depending on the nature of non-respondents could be trivial or meaningful. Slide 23 Here’s a formal definition of random sampling error: the difference between sample results and the result of a census conducted using the identical procedures. In other words, it’s statistical fluctuations through chance variation. Given the nature of marketing research sample, random sampling error should be minimal provided a large enough sample is drawn. If an insufficiently large sample is drawn, then it’s possible that the particular sample could be idiosyncratic in some way. This is why sample size is critical. Page | 5 Slide 24 This slide illustrates the problems associated with sample frame error. Circle D is ideal, with sample frame complete and no sampling frame error. Circle C is acceptable; although incomplete, it’s unbiased, which mean certain types of population elements aren’t excluded systematically. Because the frame is representative of the target population, any estimates made from sampling a sufficient number of elements should be reasonably accurate. In contrast, Circles A and B represent problematic cases. For Circle A, the sampling frame is incomplete, which means certain types of population elements are excluded systematically. As a result, the sample tends to overrepresent some population elements and underrepresent other population elements. Analyzing the data from this sample frame will produce biased estimates. Circle B is even worse; not only is the sample frame incomplete, but some listed entities aren’t members of the target population. Circle B is the worst case scenario. Slide 25 Perhaps I’ve been overly dismissive of random sampling error. Thinking about survey research in general and typical errors, I argue that the procedure for selecting elements from a sample frame—provided a sufficiently large sample—introduces relatively little error. In fact, most survey error is introduced by selecting the wrong frame or by self-selection (to participate in the survey) bias. Systematic or non-sampling errors are unrepresentative sampling results due to study design or execution flaws. I recommend, once you’ve identified an appropriate size sample, that you worry more about systematic error than sampling error. Slide 26 Although spamming and other junk electronic communications seem of more concern now, being on a mailing list and receiving junk mail is annoying. Many commercially available lists used in applied marketing research started as standard mailing lists. Slide 27 The next two slides show example mailing lists: advertisers by direct mail, affluent households, (a more updated slide would show these households making at least $75,000 or more), college department heads, exterminators, junk dealers, morticians, rabbi’s, taxidermists, and yacht boat owners. Don’t underestimate the likelihood of a commercial list for a rare or unusual population. Slide 28 (No Audio) Slide 29 The problems with commercial lists are three-fold: representativeness, omissions and duplications, and recency. Representativeness entails a predisposition to include or exclude certain types of people. Omissions are people who should be but aren’t listed, and duplications are people who are listed more than once, such as professionals and physicians with an office and home telephone number. The telephone book is a poor list choice because people who choose to have an unlisted number differ systematically from people who have a listed number. Multiple listings mean that some people are more likely than other people to be surveyed, and these more likely respondents will differ systematically from the general population. Recency relates to badly dated lists with members who no longer qualify (for example, physician lists will include recently retired physicians) or who are linked to old phone numbers and addresses. Page | 6 Slide 30 To summarize, telephone directories are not current. Roughly 10% of the numbers will no longer be good because people have moved, changed their numbers, et cetera. The demographics and socioeconomics of people who choose to be omitted will differ from those people who choose to be listed. As a result, phonebooks are biased sample frames of people who own telephones. The solution to directory limitations is random-digit dialing. To generate random yet likely working and residential telephone numbers, a researcher who wanted to survey the residents of a city would first identify all residential telephone exchanges in that city. Then, the researcher would add randomly generated four-digit numbers to those exchanges. However, to reduce the percent of non-working telephone numbers generated, I suggest the “add 1” technology to an existing number because it’s more likely to create a working number. Slide 31 (No Audio) Slide 32 Despite our best effort to draw a representative sample, after the fact it may be necessary to balance the sample for the relative propensity of different respondents. For example, if AfricanAmericans tended to be underrepresented and Whites tended to be overrepresented in a sample, relative to their nature propensities in the population, then the sample could be rebalanced by weighting the responses. In this example, a White response would be weighted 0.94, an African-American response would be weighted 1.19, and a Mexican-American response would be weighted 1.161. Hence, it’s possible after-the-fact to balance a sample so that it’s more representative of the target population. Slide 33 (No Audio) Slide 34 I’ll close here by talking about Internet and panel samples. Internet sampling is unique among the ways to collect samples. Internet surveys allow researchers to access a large sample rapidly and inexpensively. Respondents should have adequate opportunity to respond to an online survey, which requires it be kept open long enough for all sample units to have participated. Slide 35 Internet-based samples could be probability or non-probability samples, depending on the technology for accessing respondents. I’ve done studies in which I’ve surveyed marketing professors in the U.S. I acquired a list of professors and their email addresses from a publisher of this information. Next, I asked student volunteers to visit each university’s marketing department web page to find contact information for additional faculty. Ultimately, I created a list of 4,000 valid faculty e-mail addresses. Then, I sent an e-mail to each faculty member that asked them to participate in my study. Because I used a carefully developed sample frame, I could consider my sample a scientific one. Alternatively, I could randomly select people who visited my web site or identify a listserv whose members would be qualified to participate in my study and make an appeal on that listserv. This process doesn’t represent taking a census or a scientific sample because there’s no list of population elements. Although online surveys tend to rely on non-scientific opt-in samples, Internet samples may be representative of a target population and allow researchers to access Page | 7 hard-to-reach respondents. The disadvantage of online samples is that some consumers lack PC or internet access (thus systematically excluding some potential respondents and possibly biasing the sample), PC-using skills vary across potential respondents, and different PC users have access to PCs of varying capacities (thus causing lowest-common-denominator software programming, which tends to eliminate content like streaming video). Slide 36 I could randomly select visitors to my site—who would comprise a convenience sample—via some sort of pop-up technology that solicited a randomly selected person to opt into my survey. This type of sample is non-scientific because more frequent visitors would be overrepresented. Slide 37 (No Audio) Slide 38 Panel samples are comprised of people who agreed to participate on a continuing basis. These people are compensated for their time, by either having their names entered into a sweepstakes or with small cash incentives for continued participation. One advantage of panel samples is high response rates. In addition, because it’s possible to track the response of a given panel member, it’s necessary only to collect demographic and socioeconomic data once. Afterwards, researchers can match that person’s response to his or her demographic and socioeconomic data from earlier questionnaires. Panel samples generate huge databases of responses. This rich database makes it possible to assess subsamples based on demographics, lifestyles, product ownership, or other characteristics. Panel samples provide an interesting alternative if a non-probability sample is acceptable. Slide 39 To recap this lecture on sample design, I introduced it by talking about basic sampling terminology—census and population—and then moved on to a discussion of the stages in selecting a sample, from defining the target population into drawing the sample itself. Then, I discussed the different types of non-probability and probability samples and explained when you might use one versus another. Next, I spoke about sampling error. I closed by discussing Internet and panel samples. Page | 8