EMR 6500: Survey Research Dr. Chris L. S. Coryn Lyssa N. Wilson Spring 2013 Agenda • Stratified random sampling for means and totals • Review Stratified Random Sampling Stratified Random Sampling • A stratified random sample is one in which some form of random sampling is applied in each of a set of separate groups formed from all entries on a sampling frame from which a sample is to be drawn Strata • In stratified random sampling, strata are nonoverlapping groups separating population elements • By strategically forming these groups, stratification becomes a feature of the sample design that can improve the statistical quality of survey estimates Notation for Stratified Random Sampling L = Number of strata Ni = Number of samplig units in strata i N = Number of sampling units in the population = N1 + N2 + + NL Allocation to Strata • Deciding how a stratified sample will be distributed among all strata is called stratum allocation • The most appropriate allocation method depends on how the stratification will be used Equal Allocation • If the main purpose of stratification is to control subgroup sample sizes for important population subgroups, stratum sample sizes should be sufficient to meet precision requirements for subgroup analysis • An important part of the analysis is to produce comparisons among all subgroup strata • In this instance, equal allocation (i.e., equal sample sizes) would be appropriate Proportional Allocation • Proportional allocation is a prudent choice when the main focus of the analysis is characteristics of several subgroups or the population as a whole and where the appropriate allocations for these analyses are discrepant • Proportional allocation involves applying the same sampling rate to all strata, thus implying that the percent distribution of the selected sample among strata is identical to the corresponding distribution for the population Optimum Allocation • Optimum allocation, in which the most cost-efficient stratum sample sizes are sought, can lead to estimates of overall population characteristics that are statistically superior to those from proportionate allocations • When all stratum unit costs are the same, the stratum sampling rates that yield the most precise sample estimates are proportional to the stratum-specific standard deviations (Neyman allocation) Estimation of a Population Mean and Total Estimate of Population Mean L 1 1 yst = [ N1 y1 + N 2 y2 + + N L yL ] = å Ni yi N N i=1 1 é 2 V̂(yst ) = 2 ë N1 V̂ ( y1 ) + N 22V̂ ( y2 ) + + N L2V̂ ( yL )ùû N 2ö 2 öù é æ ö æ æ ö æ 1 n1 s1 nL sL 2 2 = 2 ê N1 ç1- ÷ç ÷ + + N L ç1- ÷ç ÷ú N ë è N1 øè n1 ø è N L øè nL øû L 2ö æ ö æ 1 ni si 2 = 2 å Ni ç1- ÷ç ÷ N i=1 è Ni øè ni ø Example for a Population Mean N n M SD Town A 155 20 33.90 5.95 Town B 62 8 25.12 15.25 Rural 93 12 19.00 9.36 1 yst = [ N1 y1 + N 2 y2 + + N L yL ] N 1 é = ë(155) (33.900) + ( 62) ( 25.125) + ( 63) (19.000)ùû 310 = 27.675 Example for a Population Mean 2ö æ ö æ 1 ni si 2 V̂(yst ) = 2 å Ni ç1- ÷ç ÷ N i=1 è Ni øè ni ø L 2 2 2 2 2 2ù é 1 (155) ( 0.871) ( 5.95) ( 62) ( 0.871) (15.25) ( 93) ( 0.871) ( 9.36) ê ú = + + 2 20 8 12 úû (310) êë =1.97 yst = ±2 V̂ ( yst ) = 27.675± 2 1.97 = 27.7± 2.8 Estimate of Population Total L Nyst = N1 y1 + N 2 y2 + N L yL = å Ni yi i=1 2ö æ ö æ ni si 2 2 V̂(Nyst ) = N V̂(yst ) = å Ni ç1- ÷ç ÷ è Ni øè ni ø i=1 L Example for Population Total Nyst = 310 ( 27.7) = 8, 587 V̂(Nyst ) = N V̂(yst ) = (310) (1.97) =189, 278.560 2 2 Nyst ± 2 V̂ ( Nyst ) = 8, 587± 2 189, 278.560 = 8, 587±870 Selecting the Sample Size for Estimating Population Means and Totals Sample Size for Estimating Population Means and Totals L åN s 2 i n= 2 i / ai i=1 L N D + å N is 2 2 i i=1 B D = when estimating m 4 2 B D = 2 when estimating t 4N 2 Example for a Population Mean s » 25, s » 225, and s »100 2 1 2 2 2 3 1 1 1 allocation fractions are a1 = , a2 = , and a3 = 3 3 3 B=2 B 2 D = = =1 4 4 2 2 Example for a Population Mean N1 =155, N2 = 62, and N3 = 93 Ns Ns Ns Ns å a = a + a + a i 1 2 3 i=1 3 2 i 2 i 2 2 1 1 2 2 2 2 2 3 2 3 155) ( 25) ( 62) ( 225) ( 93) (100) ( = + + (1/ 3) (1/ 3) (1/ 3) = ( 24,025) ( 75) + (3,844) ( 675) + (8, 649) (300) 2 2 = 6, 991, 275 2 Example for a Population Mean 3 2 2 2 2 N s = N s + N s + N s å i i 11 2 2 33 i=1 = (155) ( 25) + ( 62) ( 225) + ( 93) (100) = 27,125 N D = (310) (1) = 96,100 2 2 Example for a Population Mean L n= 2 2 N å i s i / ai i=1 L N D + å N is 2 i=1 2 i 6, 991, 275 6, 991, 275 = = = 56.7 96,100 + 27,125 123, 225 æ1ö n1 = n ( a1 ) = 57 ç ÷ =19 è 3ø æ1ö n2 = n ( a1 ) = 57 ç ÷ =19 è 3ø æ1ö n3 = n ( a1 ) = 57 ç ÷ =19 è 3ø Neyman Allocation æ ö ç ÷ Nis i ÷ ç ni = n L ç ÷ ç å N ks k ÷ è k=1 ø æ ö ç å N ks k ÷ è k=1 ø 2 L n= L N D + å N is 2 i=1 2 i Neyman Allocation s 1 » 5, s 2 »15, and s 3 »10 N1 =155, N2 = 62, and N3 = 93 B=2 3 åN s i i=1 i =N1s 1 + N 2s 2 + N3s 3 = (155) ( 5) + ( 62) (15) + ( 93) (10) = 2, 635 Neyman Allocation æ ö ç ÷ é ù 155 5 N s ( ) ( ) n1 = n ç L i i ÷ = n ê ú = n ( 0.30) ç ÷ ë 2, 635 û ç å N ks k ÷ è k=1 ø é ( 62) (15) ù n2 = n ê ú = n ( 0.35) ë 2, 635 û é ( 93) (10) ù n3 = n ê ú = n ( 0.35) ë 2, 635 û Neyman Allocation B2 22 D = = =1 4 4 N D = (310) (1) = 96,100 2 2 3 åN s i i=1 2 i =N s + N 2s + N3s 2 1 1 2 2 2 3 = (155) ( 25) + ( 62) ( 225) + ( 93) (100) = 27,125 Neyman Allocation æ ö çå N ks k ÷ è i=1 ø 2 3 n= = 3 N D + å N is 2 2 i ( 2, 635) 2 96,100 + 27,125 i=1 n1 = na1 = ( 57) ( 0.30) =17 n2 = na2 = ( 57) ( 0.35) = 20 n3 = na3 = ( 57) ( 0.35) = 20 = 56.34 Proportional Allocation L n= 2 N s å i i i=1 L 1 ND + å Nis i2 N i=1 Proportional Allocation s 1 »10, s 2 »10, and s 3 »10 N1 =155, N2 = 62, and N3 = 93 B=2 3 åN s i i=1 2 i = N s + N 2s + N3s 2 1 1 2 2 2 3 = (155) (100) + ( 62) (100) + ( 93) (100) = 310 (100) = 31,000 Proportional Allocation B2 22 D = = =1 4 4 31, 000 n= = 75.6 æ 1 ö 310 (1) + ç ÷ (31, 000) è 310 ø Proportional Allocation æ ö ç ÷ N1 ÷ æ N1 ö æ 155 ö ç n1 = n 3 = nç ÷ = nç ÷ = n ( 0.5) = 38 ç ÷ è N ø è 310 ø ç å Nk ÷ è k=1 ø æ ö ç ÷ N 2 ÷ æ N 2 ö æ 62 ö ç n2 = n 3 = nç ÷ = nç ÷ = n ( 0.2) =15 ç ÷ è N ø è 310 ø ç å Nk ÷ è k=1 ø æ ö ç ÷ N 3 ÷ æ N3 ö æ 93 ö ç n3 = n 3 = nç ÷ = nç ÷ = n ( 0.3) = 23 ç ÷ è N ø è 310 ø ç å Nk ÷ è k=1 ø Comparison of Allocation Methods L n= 2 N s å i i i=1 Proportional L 1 ND + å Nis i2 N i=1 Neyman L n= 2 2 N å i s i / ai i=1 L N 2 D + å N is i2 i=1 æ ö ç å N ks k ÷ è k=1 ø 2 L n= L N 2 D + å N is i2 i=1 General framework Review The Tailored Design Method The Tailored Design Method • Uses multiple motivational features in compatible and mutually supportive ways to encourage high quantity and quality of responses The Tailored Design Method • Premised on social exchange perspective on human behavior • Assumes that the likelihood of responding is greater when the expected rewards outweigh the anticipated costs The Tailored Design Method • Gives attention to all aspects of contacting and communicating with respondents • Encourages response by considering survey sponsorship, the nature of the population and variations within it, and content of questions The Tailored Design Method • Emphasizes reducing errors of coverage, sampling, nonresponse, and measurement Coverage Error • Occurs when all members of a population do not have a known, non-zero probability of selection • Occurs when those who are excluded are different from those who are included Sampling Error • Results from surveying only some rather than all members of a population • Represented by B, the bound on the error of estimation Nonresponse Error • Occurs when people selected do not respond and are different than those who do • Nonresponse can occur at the level of items within a survey or at the level of the survey – MAR – MCAR Measurement Error • Occurs when responses are inaccurate or imprecise • Primarily related to poor layout and poor design and wording of questions Social Exchange and Surveys • Addresses three central questions about design and implementation 1. How can the perceived rewards for responding be increased? 2. How can the perceived costs of responding be reduced? 3. How can trust be established so that people believe the rewards will outweigh the costs of responding? Increasing Benefits • • • • • • • • • Provide information about the survey Ask for help or advise Show positive regard Say thank you Support group values Give tangible rewards Make the questionnaire interesting Provide social validation Inform people that opportunities to respond are limited Decreasing Costs • Make it convenient to respond • Avoid subordinating language • Make the questionnaire short and easy to complete • Minimize requests for personal or sensitive information • Emphasize similarity to other requests or tasks to which a person has already responded Establishing Trust • Obtain sponsorship by legitimate authority • Provide a token of appreciation in advance • Make the task appear important • Ensure confidentiality and security of information Features that can be Tailored • Survey mode – Singular or multiple • Sample design – Type of sample – Number of units sampled • Incentives – Type of incentive – Amount or cost of incentive – Before or after Features that can be Tailored • Contacts – Number of contacts – Timing of initial and subsequent contacts – Mode of each contact – Whether contacts will be personalized – Sponsorship information – Visual design of each contact – Text or words in each contact Features that can be Tailored • Additional materials – Whether to provide them at all – Type of materials (e.g., research report) – Visual design of materials – Text or wording of materials Features that can be Tailored • Questionnaire – Topics included – Length (duration, number of pages/screens, number of questions) – First page or screen – Visual design – Organization and order of questions – Navigation through questionnaire Features that can be Tailored • Individual questions – Topic (sensitive, of interest to the respondent) – Type (open-ended versus closed-ended) – Organization of information – Text or wording – Visual design Coverage and Sampling Central Terminology • An element is an object on which a measurement is taken • A population is a collection of elements to which an inference is made from a sample • A sample is a collection of sampling units drawn from a frame or frames • Sampling units are nonoverlapping collections of elements from the population that cover the entire population • A frame is a list of sampling units Central Terminology • A completed sample is the units that respond • Sampling error is the result of collecting data from only a subset, rather than all, units from a frame – Again, represented by B, the bound on the error of estimation Coverage • The degree to which the units in a sampling frame correspond to the population of interest • Coverage is likely one of the most serious problems in most surveys Coverage and Frame Problems Covered Population Ineligible Units Undercoverage Ineligible Units Undercoverage Target Population Frame Population Reducing Coverage Error • Central questions: – Does the list contain everyone in the survey population? – Does the list include people who are not in the study population? – How is the list maintained and updated? – Are the same sample units included on the list more than once? – Does the list contain other information that can be used to improve the survey? Estimate of Population Mean n m̂ = y = åy i i=1 n æ n ö s2 V̂(y) = ç1- ÷ è Nø n æ n ö s2 2 V̂ ( y ) = 2 ç1- ÷ è Nø n Estimate of Population Total n tˆ = Ny = N å yi i=1 n 2ö æ æ ö n s 2 V̂(tˆ ) = V̂ ( Ny ) = N ç1- ÷ç ÷ è N øè n ø 2ö æ æ ö n s 2 2 V̂ ( Ny ) = 2 N ç1- ÷ç ÷ è N øè n ø Selecting the Sample Size for Estimating Population Means and Totals Sample Size for Estimating Population Means Ns 2 n= 2 N -1 D + s ( ) • where Sample Size for Estimating Population Means • Often, the population variance, s , is unknown • An approximate value of s can be obtained by Range s» 4 2 Sample Size for Estimating Population Totals Ns 2 n= 2 N -1 D + s ( ) • where B2 D= 2 4N Estimation of a Population Proportion Estimate of Population Proportion n åy i p̂ = y = æ n ö p̂q̂ V̂( p̂) = ç1- ÷ è N ø n -1 i=1 n where q̂ =1- p̂ æ n ö p̂q̂ 2 V̂ ( p̂) = 2 ç1- ÷ è N ø n -1 Selecting the Sample Size for Estimating a Population Proportion Sample Size for Estimating Population Proportions Npq n= ( N -1) D + pq • where q̂ =1- p̂ • and B2 D= 4 An Overview of Crafting Good Questions Issues to Consider 1. What survey mode(s) will be used to ask the questions? 2. Is the question being repeated from another survey, and/or will answers be compared to previously collected data? 3. Will respondents be willing and motivated to answer accurately? 4. What type of information is the question asking for? Choosing Words and Forming Question 1. 2. 3. 4. 5. 6. 7. 8. 9. Make sure the question applies to the respondent Make sure the question is technically accurate Ask one question at a time Use simple and familiar words Use specific and concrete words to specify the concepts clearly Use as few words as possible to pose the question Use complete sentences with simple sentence structures Make sure “yes” means yes and “no” means no Be sure the question specifies the response task Visual Presentation of Survey Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. Use darker and/or larger print for the question and lighter and/or smaller print for answer choices and answer spaces Use spacing to create subgrouping within a question Visually standardize all answer spaces or response options Use visual design properties to emphasize elements that are important to the respondent and to deemphasize those that are not Make sure words and visual elements that make up the question send consistent messages Integrate special instructions into the question where they will be used rather than including them as freestanding entities Separate optional or occasionally needed instructions from the question stem by font or symbol variation Organize each question in a way that minimizes the need to reread portions in order to comprehend the response task Choose line spacing, font, and text size to ensure the legibility of the text