MODULE R7 - SAMPLING Randomization Random sampling as suggested by Van Dalen (1979) often means chance or a haphazard method of assignment to many people, but in reality it is a carefully controlled process. Randomization is used to eliminate bias, both conscious and unconscious, that researchers might introduce while selecting a sample. Kerlinger (1986) described randomization as the assignment of objects (subjects, treatments, groups, etc.) of a population to subsets (sample) of the population in such a way that, for any given assignment to a subset (sample), every member of the population has an equal probability of being chosen for that assignment. Randomization is essential for probability samples which are the only samples that can generalize results back to the population. Kerlinger (1986) reported that random sampling is important because it is required by inferential statistics. If the researcher desires to make inferences about populations based on the behavior of samples, then random sampling must be used. Stratification Stratified sampling is a procedure for selecting a sample that includes identified subgroups from the population in the proportion that they exist in the population. This method can be used to select equal numbers from each of the identified subgroups if comparisons between subgroups is important. A good example of stratified sampling would be to divide the population into men and women. The different strata to use for each study would be determined in part by the review of literature of previous research. The purpose of stratified sampling is to guarantee the desired distribution among the selected subgroups of the population. Proportional Proportional sampling (Van Dalen, 1979) provides the researcher a way to achieve even greater representativeness in the sample of the population. This is accomplished by selecting individuals at random from the subgroup in proportion to the actual size of the group in the total population. Proportional sampling is used in combination with stratified and cluster sampling. Clusters The most used method in educational research according to Kerlinger (1986) is cluster sampling. Groups of elements (clusters) instead of individuals from the population are used for the sample. Cluster sampling often is more convenient when the population is very large. It often isn’t possible to randomly select from the entire population but is more manageable when using clusters because of time, expense, and convenience. A cluster sampling is using schools as clusters and randomly selecting from the list of schools instead of randomly selecting individuals from a list that includes all schools. This can help the researcher cut down on travel expense, time, etc. One problem with cluster sampling is that it usually produces a larger sampling error than a simple random sample of the same size because the clusters tend to be more similar within the cluster, reducing the representativeness of the sample (Van Dalen, 1979). Systematic In some cases when the population of a study is available as a list, a sample is drawn from certain intervals on the list. The starting point is randomly chosen and then every so many numbers another individual is chosen from the list and added to the sample. This method can be equal to random selection only if the names were randomized at the beginning. Van Dalen (1979) cautions us to be wary of departure from randomness of the list because of structure, some trend, or cyclical fluctuation. Purposive Kerlinger (1986) explained purposive sampling as another type of nonprobability sampling, which is characterized by the use of judgment and a deliberate effort to obtain representative samples by including typical areas or groups in the sample. In other words, the researcher attempts to do what proportional clustering with randomization accomplishes by using human judgment and logic. As a result, there are many opportunities for error. In addition, nonprobability samples do not use random sampling which makes them unacceptable for generalizing back to the population. In random sampling each object has an equal and independent opportunity of being chosen. Stratified sampling involves the identification of the variable and subgroups (strata) for which you want to guarantee appropriate representation (either proportional or equal). To use cluster sampling, you must list and identify all clusters that comprise the population and estimate the average number of population members per cluster to determine the number of clusters needed for the sample. Proportional sampling determines the ratio of individuals in subgroups for which you want proportional representation. Once the strata, cluster, or ratio has been determined, individual objects, clusters, or individuals in a subgroup are randomly selected. Systematic sampling takes every nth name (n=size of population divided by desired sample size) on a list of the population until the desired sample size is reached. Determination of Sample Size The first thing that is needed is to identify or define the population. Jaccard (1983) defined population as the aggregate of all cases to which one wishes to generalize. At this time, it is necessary to determine if your research requires the identification of subgroups and if so define the subgroups within the population. To have a sample that is of use it needs to be as close as possible to being representative of the complete population. Popham and Sirotnik (1973) contend that in order to draw legitimate inferences about populations from samples that the sample has to be representative of the population and randomly selected. Van Dalen (1979) lists three factors that he considers to determine the size of an adequate sample as (l) the nature of the population, (2) the type of investigation, and (3) the degree of precision desired. The formula for estimating the sample size and a table for determining the sample size based on confidence level needed from a given population was provided by Krejcie and Morgan (1970). where S = required sample size N = the given population size P = population proportion that for table construction has been assumed to be .50, as this magnitude yields the maximum possible sample size required d = the degree of accuracy as reflected by the amount of error that can be tolerated in the fluctuation of a sample proportion p about the population proportion P - the value for d being .05 in the calculations for entries in the table, a quantity equal to X2 = table value of chi square for one degree of freedom relative to the desired level of confidence, which was 3.841 for the .95 confidence level represented by entries in the table TABLE FOR DETERMINING NEEDED SIZE S OF A RANDOMLY CHOSEN SAMPLE FROM A GIVEN FINITE POPULATION OF N CASES SUCH THAT THE SAMPLE PROPORTION p WILL BE WITHIN ± .05 OF THE POPULATION PROPORTION P WITH A 95 PERCENT LEVEL OF CONFIDENCE Population Size Sample Size Population Size Sample Size Population Size Sample Size 10 10 220 140 1200 291 15 14 230 144 1300 297 20 19 240 148 1400 302 25 24 250 152 1500 306 30 28 260 155 1600 310 35 32 270 159 1700 313 40 36 280 162 1800 317 45 40 290 165 1900 320 50 44 300 169 2000 322 55 48 320 175 2200 327 60 52 340 181 2400 331 65 56 360 186 2600 335 70 59 380 191 2800 338 75 63 400 196 3000 341 80 66 420 201 3500 346 85 70 440 205 4000 351 90 73 460 210 4500 354 95 76 480 214 5000 357 100 80 500 217 6000 361 110 86 550 226 7000 364 120 92 600 234 8000 367 130 97 650 242 9000 368 140 103 700 248 10000 370 150 108 750 254 15000 375 160 113 800 260 20000 377 170 118 850 265 30000 379 180 123 900 269 40000 380 190 127 950 274 50000 381 200 132 1000 278 75000 382 210 136 1100 285 100000 384 Survey Design Notes by Dr. Don Dillman; A Survey Can: "Provide the distribution of a characteristic in a population by collecting information from only a few of its members." Rules of Thumb: Sample randomly Doubling sample size reduces sampling error by half Sampling can be far more complex than described. Measurement Error: Occurs when respondent answers to questions are inaccurate. A result of question wording, the questionnaire, the interviewer, the survey method, and/or the respondent. Sampling Error Occurs because only a subset of the population is surveyed. n 97 Precision 385 +/- 5% 1068 +/- 3% 2175 +/- 2% +/- 10% Coverage Error Occurs because samples list does not include all elements of the population that one wishes to survey. Each member of the entire population needs to have a known (non-zero) chance of being included in the sample. Non-response Error Occurs when some of the sampled individuals do not respond and they are different from those who do in a way that is relevant to study. This is more important than response rate! For a survey to be accurate, each of the four sources of data collection error must be attended to . sampling error Coverage error measurement error Non-response error Perspective for Improving Response Increase rewards Decrease costs Promote trust This is a social exchange, not and economic exchange. Requirements for Maximizing Mail Survey Response Respondent-friendly questionnaire Personalized correspondence Prepaid financial incentive - $ 2 - 5 First Class mail Four contacts - pre-notice, questionnaire, reminder, replacement questionnaire Fifth contact - 2 day priority mail or telephone Why Mail Surveys Usually Fail Inadequate sample frames and respondent selection Poor Questions Selective non-response How to Improve Responce A List Multiple contacts Stamped return envelope $ Pre-incentive Respondent-friendly questionnaire B List No labels Real signature Green paper Graphic cover design SELF ASSESSMENT 1. Define. Sample Population 2. Describe the sampling approach of randomization. 3. Define stratification as related to sampling. 4. Explain clusters regarding a sample. 5. Describe proportional in relation to sampling. 6. Define and explain systematic sampling. 7. Explain the use of purposive sampling. 8. Explain how to determine the sample size needed to give the most representative sample. 9. Describe the type of sampling method that would be used for your research and explain why this would be the best choice.