1 SECTION 2 SIMPLE RANDOM SAMPLING 2.1 What is Simple Random Sampling? 2.1.1 Definition Simple Random Sampling --- A method of probability sampling in which a sample of n elements is randomly chosen without replacement from a population of N elements (SRSWOR vs. SRSWR) 2.1.2 One Selection Procedure for Simple Random Sampling A. Number the elements in the population (i.e., sampling frame) from 1 to N. B. Using a table of random numbers, select and record a random number between 1 and N. 2 C. Select a second random number between 1 and N. If the second number is the same as the first selected number, discard it and go to the next step. If the second number is not the same as the first number, record it. D. Select a third random number between 1 and N. If this number is the same as either one of the previous numbers, discard it and go to the next step. If the number is not the same as the previous numbers, record it. E. Continue in this manner until n different numbers between 1 and N have been chosen. F. Population elements corresponding to selected numbers are an SRS sample of size n. 2.1.3 Some Statistical Notes about Simple Random Samples If we use SRS to select a sample of size n from a population of N elements: A. All possible SRS samples have the same chance of being selected. 3 B. The probability that any one population element will be chosen is n/N. C. Observations taken from elements in an SRS are not statistically independent. 2.2 Estimating a Population Mean from a Simple Random Sample 2.2.1 Setting A. We have selected an SRS of size n from a population of N elements. B. We wish to use our sample to estimate the population mean per element (denoted by the symbol, Y ) for some characteristic of the population. C. Examples: (1) Average annual dental care expenses for employees of a large corporation. (2) Average expenditures for prescription drugs paid by customers of a drug store chain. (3) Average height in inches of adult males students at a state university. 4 2.2.2 Estimator of Y n yi y1 y 2 y n i Y y srs 1 n n where y i refers to the value of the i-th element selected in the sample. (2.1) 2.2.3 Some Statistical Notes about y srs A. Different SRS samples are likely to produce different values for y srs , hence y srs is a random variable with a sampling distribution. B. y srs is an unbiased estimator of Y . (go to D&C; “parameter”) C. When n is large (i.e., greater than 30), the sampling distribution for y srs closely resembles the normal distribution; this characteristic can be used when forming confidence intervals or testing hypotheses. 5 2.2.4 Estimated Variance of y srs l f 2 v(ysrs ) s n (2.2) where f = n/N is the sampling rate and s2 is the estimated element variance for the population, calculated as n s 2 ( y i y srs ) i 1 n 1 n 2 n yi i 1 2 L y O MQ P N 2 n i 1 i n( n 1) (2.3) 2.2.5 Some Statistical Notes about v(ysrs ) A. The term (l-f) in formula (2.2) is called the finite population correction (fpc) which is a special adjustment to account for the fact that our sample was chosen without replacement from a finite population (i.e., an existing population of limited size). This correction factor is very nearly 1 and can be effectively ignored when the sampling rate is small (i.e., less than 0.05). 6 B. Different SRS samples of the same size which are chosen from the same population are likely to produce different values for v(ysrs ) ; hence v(ysrs ) is a random variable with a sampling distribution. C. v(ysrs ) is a unbiased estimator of the true variance of y srs . D. s2 is an unbiased estimator of the population element variance. 2.2.6 Estimated Standard Error of y srs lf se ( y srs ) = v(ysrs ) = s (2.4) n where s is the square root of s2 computed by formula (2.3). a f 2.2.7 Confidence Interval for Y n 30 Lower Boundary: y srs - {t}{se( y srs )} (2.5) Upper Boundary: y srs + {t}{se( y srs )} (2.6) 7 The value for t depends on the confidence level that we choose. For example: Confidence Level (In Percent) t 68 1.00 95 1.96 99 2.58 Interpretation: We are 95 percent sure that Y is covered by the interval whose (t = 1.96) boundaries are defined by formulas (2.5) and (2.6). 2.3 Estimating a Population Total from a Simple Random Sample 2.3.1 Setting A. We have selected an SRS of size n from a population of N elements. 8 B. We wish to use our sample to estimate the population aggregate total (denoted by the symbol o Y ) for some characteristic of the population. C. Examples: (1) Total combined income for all United States citizens if individual income is the characteristic of interest. (2) Total number of dental visits experienced by persons living in some small city. (3) Total dollar value of private health insurance premiums paid by workers in a large industrial plant. o D. We know that Y NY . 2.3.2 Estimator of Y n y N n Y ysrs Nysrs yi i n i1 i 1 n / N ô o (2.7) 9 o 2.3.3 Some Statistical Notes about y srs A. Different SRS samples of the same size which are chosen from the same population are likely o o to produce different values for y srs ; hence y srs is a random variable with a sampling distribution. o o B. y srs is an unbiased estimator of Y . o C. The sampling distribution for y srs is very similar to the normal distribution when n is greater than 30. o 2.3.4 Estimated Variance of y srs N 2 (l f ) 2 v(ysrs ) N v(ysrs ) s n o 2 (2.8) 10 2.3.5 Some Statistical Notes about v( y ) srs A. The term (l-f) is the finite population correction (see Section 2.2.5). B. Different SRS samples of the same size which are chosen from the same population are likely to o o produce different values for v(ysrs ) ; hence v(ysrs ) is a random variable with a sampling distribution. o C. v(ysrs ) is an unbiased estimator of the true o variance of y srs . o 2.3.6 Estimated Standard Error of y srs . o o se(ysrs ) v(ysrs ) lf Ns n (2.9) where s is the square root of s2 computed by formula |(2.3). 11 o 2.3.7 Confidence Interval for Y ( n 30) o Lower Boundary: ysrs {t} {se ysrs } o o Upper Boundary: ysrs {t} {se ysrs } o (2.10) (2.11) where the value of t is determined by the confidence level (see Section 2.2.7). o Interpretation: We are 95 percent sure that Y is covered by the interval whose (t=1.96) boundaries are defined by formulas (2.10) and (2.11). 2.4 Estimating a Population Proportion from a Simple Random Sample 2.4.1 Setting A. We have selected an SRS of size n from a population of N elements. B. We wish to estimate the proportion of all elements in the population which possess some 12 attribute; we denote the population proportion to be estimated by the symbol, P. C. Examples: (1) Proportion of residents of a large nursing home who favor comprehensive national health insurance. (2) Proportion of patients in a large hospital who are discharged in two or fewer days. (3) Proportion of emergency medical workers in North Carolina who have experienced one or more episodes of violence in the line of work during the last six months. D. A population proportion is a special type of population mean in which the characteristic associated with each element is equal to 1 if the element has the attribute (e.g., favoring national health insurance) and 0 if the element does not have the attribute (e.g., not favoring national health insurance). The mean of this type of dichotomous 0-or-1 characteristic is also the proportion of all population elements possessing the attribute. E. Quite clearly 0 P 1. 13 2.4.2 Estimator of P n ^ P p srs = yi i 1 n (2.12) number of sample elements possessing the attribute number of sample elements where y i 0 if the i-th sample element does not possess the attribute and y i 1 if it does. 2.4.3 Some Statistical Notes about p srs A. Different SRS samples of the same size which are chosen from the same population are likely to produce different values for p srs ; hence p srs is a random variable with a sampling distribution. B. p srs is an unbiased estimator of P. C. The sampling distribution for p srs is very similar to the normal distribution when n p srs and n(1- p srs ) are greater than 10. 14 2.4.4 Estimated Variance of p srs p srs 1 p srs lf v(psrs ) p 1 p srs srs n n 1 a f (2.13) since n s p srs (1 p srs ) n 1 2 2.4.5 Some Statistical Notes about v(psrs ) A. The term (l-f) is the finite population correction (see Section 2.2.5). B. Different SRS samples of the same size which are chosen from the same population are likely to produce different values for v(psrs ) ; hence v(psrs ) is a random variable with a sampling distribution. C. v(psrs ) is an unbiased estimator of the true variance of p srs . 15 2.4.6 Standard Error of p srs lf se psrs v(psrs ) psrs l psrs n 1 a (2.14) f 2.4.7 Confidence Interval for P np srs 10 Lower Boundary: psrs {t} {se psrs } Upper Boundary: psrs {t} {se psrs } (2.15) (2.16) The value of t is determined by the confidence level (see Section 2.2.7). Interpretation: We are 95 percent sure that P is covered by the interval whose (t=1.96) boundaries are defined by formulas (2.15) and (2.16). 16 2.5 Illustrative Example of Simple Random Sampling 2.5.1 Setting A. We have a population of N=270 blocks from a small town. B. We wish to estimate the following from a sample of n=10 blocks: (1) Y : The average number of rented dwellings per block in the town. o (2) Y : The total number of rented dwellings in the town. (3) P: The proportion of blocks in the town with ten or more rented dwellings. 2.5.2 Sampling Frame List of blocks presented in Table 2.1 17 18 2.5.3 Selection Procedure A. We must randomly choose 10 different numbers between 1 and 270. B. Using the random numbers in Table 2.2, we start in the upper left-hand corner and move across the page as if we are reading a book, choosing threedigit numbers at a time. Numbers between 271 and 999 (also 000) are not useful. C. Following this procedure the following 10 numbers are chosen: Random i Number 1 256 2 106 3 54 4 267 5 51 6 8 7 154 8 48 9 112 10 160 (Skip to Section 2.6) Number of Rented Dwellings (yi) 0 27 12 3 30 30 1 58 44 4 Attribute: > 10 Rented Dwellings (yi) 0 1 1 0 1 1 0 1 1 0 19 2.5.4 Estimates A. Mean number of rented dwellings per block: n y srs yi i 1 n 209 20.9 10 B. Total number of rented dwellings: o y srs Ny srs (270)(20.9) 5643 C. Proportion of blocks with ten or more rented dwellings: n p srs yi i 1 n 6 0.6 10 20 2.5.5 Variances and Standard Errors A. Mean number of rented dwellings per block: n 2 n 2 n y i yi l f i 1 i 1 v(ysrs ) n(n 1) n 1 0.037 (10)(7999) (209) 2 38.85 10 (10)(9) L M N L O M P Q N O P Q se(ysrs ) v(ysrs ) 38.85 6.23 B. Total number of rented dwellings: o v(ysrs ) N2 V(ysrs ) (270)2 (38.85) 2.832x106 o o se ysrs v(ysrs ) 2.832x106 1682.9 21 2.6 Some Final Notes on Simple Random Sampling A. The variance of an estimate derived from SRS designs (and other designs as well). (1) Variance is a measure of the statistical quality of the estimate. (2) Precision is inversely related to the size of the variance (i.e., high variance implies low precision; low variance implies high precision). B. Implications of changes in sample size on variance (1) Variance is reduced when the sample size (n) is increased. (2) Larger sample sizes also contribute to smaller variances by increasing f=n/N (and thereby reducing l-f); (3) A change in sample size has more pronounced effect on the variance than does the corresponding change in f. 22 Example: N = 500,000 First SRS Sampling Design: Second SRS Sampling Design: 2 Thus, if we assume that s1 n1 1000 f1 0.002 n2 5000 f2 0.010 | | | | s22 (1 f 2 ) 2 s2 v 2 (ysrs ) n2 (1 f 2 )n1 v1 (ysrs ) (1 f1 ) s 2 (1 f1 )n 2 1 n1 .990 1, 000 x 0.992 0.200 0.198 .998 5, 000 C. Simple random sampling is the simplest probability sampling method. 23 D. Simple random sampling is only rarely used in practice. One might conceivably use it however when both of the following are true: (1) The sample size is small and either: a. The population is relatively large with a sequence of numbers uniquely identifying each element (e.g., employee ID numbers assigned sequentially to employees); or b. The population is relatively small. (2) Stratification is either not feasible or not possible (see Section 4). Supplementary Reading [1] Mendenhall, W., Ott, L., and Scheaffer, R.L., Elementary Survey Sampling, Duxbury Press, Belmont California, 1971, Chapter 4. [2] Kish, L., Survey Sampling, Wiley and Sons, 1965, Sections 2.0-2.6. [3] Cochran, W.G., Sampling Techniques, 3rd Edition, Wiley and Sons, 1977, Sections 2.1-2.9; 3.1-3.3.