AN ABSTRACT OF THE DISSERTATION OF Ruben A. Smith-Cayama for the degree of Doctor of Philosophy in Statistics presented on September 7, 1999. Title: Statistical Estimation for Initiative Petitions and Performance ofthe Decision Rule for Oregon State Petitions. Redacted for Privacy Abstract approved: ____ David R. Thomas Two topics concerning statistical sampling of initiative petitions are considered in this dissertation. The first concerns statistical estimation of the number of distinct valid signatures in a petition, and the second evaluates the statistical decision rule used by Oregon for determining certification of state initiative and referendum petitions. In several states that permit initiative petitions to modify or add legislation, statistical sampling of signatures is used to obtain an estimate for the number of distinct valid signatures in the petitions. This estimate depends on the number of signatures submitted, the number of invalid signatures and, in some states, the number of duplicates of valid signatures. We consider several linear estimators and a non-linear estimator for the number of distinct valid signatures. Their performances are compared with respect to bias and root mean squared error using several sample sizes for four fully-verified petitions from Washington State. Exact expressions for the bias and root mean square error are used for the linear estimators and estimates from simulated random sampling are used for the non­ linear estimators. For the small sampling fractions typically used in state initiative petitions (3-10%), none of the estimators are found to perform much better than the estimator that is constructed to be unbiased when valid signatures are assumed to be duplicated at most once. Oregon allows a petition to be filed in either one or two submissions. The Oregon decision rule for certification of petitions is complicated in that multiple stages of sampling are used: two stages for a single submission and three stages for two submissions. A petition can be accepted (certified) after any sampling stage but only rejected after verifying all samples from each submission. The decision rule is based on estimates for the number of distinct valid signatures obtained at each sampling stage. We evaluate the performance of the Oregon decision rule by calculating an approximation for the probability of making a correct decision for the certification of several hypothetical petitions. The petitions are chosen to represent different sizes and quality with respect to invalid and duplicated valid signatures. © Copyright by Ruben A. Smith-Cayama September 7, 1999 All Rights Reserved Statistical Estimation for Initiative Petitions and Performance of the Decision Rule for Oregon State Petitions by Ruben A. Smith-Cayama A DISSERTATION submitted to Oregon State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Presented September 7, 1999 Commencement June 2000 Doctor of Philosophy dissertation of Ruben A. Smith-Cayama presented on September 7, 1999 APPROVED: Redacted for Privacy Major Professor, representing Statistics Redacted for Privacy Chair of Department 0 tatIstIcs Redacted for Privacy I understand that my dissertation will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my dissertation to any reader upon request. Redacted for Privacy ACKNOWLEDGMENTS I would like to express my deep appreciation and sincere thanks to my dissertation advisor, Dr. David R. Thomas, for his patience, support, and guidance during the course of this dissertation research. I would also like to thank Dr. Virginia Lesser, Dr. David Birkes, Dr. David Butler, and Dr. Philippe Rossignol for their recommendations and comments for improving this dissertation. I am grateful to the University of Los Andes, Venezuela, to the Department of Statistics of Oregon State University, and to the Office of Budgets and Planning, especially Duane Faulhaber, Interim Director, for providing funding during my doctoral studies. I wish to thank Colleen Sealock, Director, and Dr. Scott Tighe, Operations Manager, of the Oregon State Elections Division, for providing information about the Oregon Certification rule. The author is also grateful to the Washington Secretary of State for providing the data from fully-verified petitions, especially to Donald F. Whiting, Assistant Secretary of State, and Pamela Floyd, Initiative Manager of the Elections Division. I wish to express my sincere gratitude to my mother, Olga, for her love and support during the pursuit of this degree, and to my brothers and sisters for their emotional support. I would like to thank my friend Breda Mufioz for her companionship and support all these years. CONTRIBUTION OF AUTHORS Dr. David Thomas proposed the topic that originated this dissertation. Dr. Thomas was also involved in the solution, interpretation of results and editing of each chapter of this dissertation. TABLE OF CONTENTS 1. INTRODUCTION .................................................................................................. . 2. ESTIMATING THE NUMBER OF DISTINCT VALID SIGNATURES IN INITIATIVE PETITIONS ............................................................................ 4 2.1 Abstract...... ........................................... ........................... ..... ......... ..... ........... ...... 5 2.2 Introduction ........................................................................ ,............................... 2.3 Terminology and Notation ................................................................................... 7 2.4 Theoretical Background........................................... ........ ..... ......... ........... .......... 8 2.4.1 2.4.2 2.4.3 5 Estimators for D ..................................................................................... 8 Estimators for M .................................................................................. 11 Expectation and Variance of M............................................................ 12 2.5 Performance of the Estimators .......................................................................... 13 2.6 Summary ........................................................................................................... 19 2.7 References........................................ .......................... ...................................... 21 2.8 Appendix ........................................................................................................... 22 3. SAMPLING FOR CERTIFICATION OF OREGON STATE INITIATIVE PETITIONS .................................................................................. 28 3.1 Abstract ............................................................................................................ 29 3.2 Introduction ....................................................................................................... 29 3.3 The Statistical Decision Problem ...................................................................... 32 TABLE OF CONTENTS (Continued) 3.4 One Submission of Signatures .......................................................................... 33 3.4.1 The Decision Rule ................................................................................ 33 3.4.2 Probability of Correct Decision ............................................................ 35 3.4.3 Numerical Results ................................................................................. 38 3.4.3.1 3.4.3.2 3.5 Single Duplication of Valid Signatures ................................. 38 Multiple Duplication of Valid Signatures .............................. 43 Two Submissions of Signatures ........................................................................ 49 3.5.1 The Decision Rule.. .............................................................................. 49 3.5.2 Probability of Correct Decision ............................................................ 50 3.5.3 Numerical Results ................................................................................. 51 3.6 Summary and Conclusions................................................................................ 56 3.7 References ......................................................................................................... 57 3.8 Appendix ........................................................................................................... 58 3.8.1 Calculation ofCov(fJ I,M).................................................................... 59 3.8.2 Calculation ofCov(U I ,fjFS) , Var(fjFS), and Cov(Mh,fjFS) for h = F,S ............................................................ 61 4. SUMMARY .............................................................................................................. 65 BIBLIOGRAPHY........................................................................................................ 67 LIST OF TABLES Tables 2.1 Description of the petitions A, B, C, and D .......................................................... 14 2.2 Expected frequency for replications of valid signatures, E(!i) ....................................................................................... 15 2.3 True values of k and T3, . .. , Tk, and k and f3, ... , fk .......................................................................... 16 2.4 Specified values of the bias adjusted factor, Bf,k,T' for each petition, adjusted estimator and sampling fraction (q): 3%, 5%, 10%, and 20% .................................................................... 17 M..................................................... .......................... 18 M..................................................... ........................ 20 3.1 Probabilities of acceptance from the first sample (PIA) and correct decision (PeD) for single duplication (D = F 2) ................................ 40 specified values 2.5 BIAS (%) of estimators for 2.6 RMSE (%) of estimators for 3.2 Probabilities of acceptance from the first sample (PIA) and of correct decision (PeD ) with the correlation coefficient for f) I and M, bias and standard deviation of the estimator M, and the bivariate normal integration limits a and b in Equation (9) for single duplication (() = 0+) and multiple (() > 0) duplication ........................ 45 3.3 Probabilities of acceptance from the first sample (HA), acceptance from the combined first and second samples (P2A ), and correct decision (PeD ) for two submissions (h = F, S) with single duplication (D = F 2 ) .........•..•.•••.••••.•••.•••.•........•••••••••.••••••.....••.••••••••• 53 To my mother, Olga, to my sisters and brothers, and to the memory of my father, Marcelino. Statistical Estimation for Initiative Petitions and Performance of the Decision Rule for Oregon State Petitions Chapter 1 Introduction Twenty-five states give citizens the power to use initiative petitions to propose legislation for consideration either in a ballot measure or in the legislature (Houser, 1985). The sponsor ofthe petition must circulate the complete text of the proposal and obtain a minimum number of signatures from registered voters. Signatures collected are then filed as a petition with the state office in charge, usually the Secretary of State. This office determines by some procedure, specified by administrative rule or law, if the petition is certified or not. To be certified a petition should contain a specified minimum number of distinct valid signatures established by law. To be valid, a signature must be that of a registered voter. Typical petitions contain both invalid signatures and duplicates of valid signatures. If a registered voter signs the petition more than once, all but one of such signatures are duplicates. The process for verifying signatures and the decision rule for certification vary among the states. Twenty-two states, including Oregon, have a formal procedure for verifying if a signature on the petition is that of a registered voter (Houser, 1985). In general the validation of signatures is accomplished by comparing the name, address, and signature on the petition with appropriate voter registration records. Results of the certification process must be given within a specified time limit, which varies greatly among the states. Oregon has only fifteen days from the due date for submission to verify all state initiative petitions. Washington has approximately ninety days for verification of initiative petitions for the ballot. Fourteen states verify all signatures submitted. Five 2 states, including California and Washington, use sampling with a procedure for deciding if all signatures are to be verified. Only Michigan, North Dakota, and Oregon base the certification of state petitions entirely on sampling. In Michigan and North Dakota the decision process is based on an estimate for the number of valid signatures (ignoring duplication). In Oregon the current decision rule is based on estimates for the number of distinct valid signatures in the petition. Only recently (1999 Oregon Administrative Rules, Chapter 165) changed their decision procedure to include estimation of the number of duplicates of valid signatures. Previously a 2% duplication rate was assumed for all petitions. Oregon allows a petition to be filed in two submissions. The Oregon decision rule for certification of petitions is complicated in that multiple stages of sampling are used: two stages for a single submission and three stages for two submissions. After the first submission, the signatures in the first sample are verified and the estimate of the number of distinct valid signatures is calculated. If the petition is not certified from the first sample, a second random sample is selected for verification from the remaining unverified signatures. Certification of the petition is then based on the estimate obtained from the combined first and second samples. If the petition is not certified before the due date for submitting signatures, the petitioners are permitted to submit additional signatures for verification. The certification of the petition is based on the estimate for the total number of distinct valid signatures in both submissions. A recent Oregon Supreme Court ruling (Susan Leo and Jonah P. Hymes vs. Phil Keisling, Secretary of State, and the state of Oregon, 1998) implies that the decision rule should not be biased either for or against the petitioner. This means the probability of making a correct decision should be at least 0.50 for any petition. In Chapter 2, we consider estimators for the number of distinct valid signatures in an initiative petition. These estimators include a linear unbiased estimator, several biased linear estimators and one non-linear estimator, based on the jackknife technique. The 3 perfonnance of the estimators is compared with respect to the bias and root mean squared error (RMSE) for four fully-verified Washington State petitions. Exact expressions for the bias and RMSE are provided for the linear estimators and simulated random sampling is used for the non-linear estimator. In Chapter 3 we evaluate the Oregon decision rule by studying the probability of a correct decision for certifYing a petition. The decision rule depends on the estimator for the number of distinct valid signatures submitted. This estimator can be expressed as a linear function of the estimators for the number of invalid signatures and the number of duplicates valid signatures. We use multivariate nonnal approximations for the joint distribution of the estimators of the number of invalid and distinct valid signatures from the multiple sampling stages to calculate the probability of correct decision. We also obtain approximations for the probabilities of acceptance from the first sample, and from the combined first and second samples when there are two submissions. Several duplication rates and distributions of multiple duplicates are considered for the case of one submission. In the case of two submissions only single duplication is evaluated. 4 Chapter 2 Estimating the Number of Distinct Valid Signatures in Initiative Petitions Ruben A. Smith-Cayama and David R. Thomas 5 2.1 Abstract In some states, if citizens are dissatisfied with certain laws or feel that new laws are needed, they can petition to place proposed legislation on the ballot. To be certified for the ballot, the sponsor of the petition must circulate the complete text ofthe proposal among voters and obtain signatures of those in favor. Petitions will contain both invalid and valid signatures. Valid signatures from registered voters can appear more than once. To qualify a petition as a ballot measure, the total number of distinct valid signatures collected must exceed a required number. We are considering the case when a simple random sample of signatures is drawn from the entire petition, and all signatures in the sample are verified. The problem is to estimate the total number of distinct valid signatures based on the sample information and the knowledge ofthe total number of signatures collected in the petition. We consider several linear estimators and one non­ linear estimator. Expressions for the variance of the linear estimators are provided. The performance ofthe estimators is evaluated using data from several Washington State petitions that have been completely verified. 2.2 Introduction Some state constitutions give initiative and referendum power to the people. If citizens from these states are dissatisfied with certain laws or feel that new laws are needed, they can petition to propose legislation, either to the legislature or to the ballot. The sponsor of the petition must circulate the complete text of the proposed legislation among voters and collect signatures of those in favor. After signatures are collected they are filed as a petition with the state office in charge, usually the Secretary of State. The office in charge determines, by some procedure established by state law, if the petition is certified or not. A petition is certified 6 by state law if the number of distinct valid signatures in the petition is equal to or exceeds the minimum required. In this paper, we are considering the case when a petition of known size contains both invalid and valid signatures. Valid signatures from registered voters can appear more than once. It is assumed that a simple random sample of signatures is drawn from the entire petition and all signatures in the sample are verified. Our interest is to estimate the number of distinct valid signatures in the petition based on the sample information and the knowledge of the petition size. Many states use this approach including California, Illinois, Oregon and Washington (Hauser, 1985). When no invalid signatures are present, the estimation problem reduces to one known as estimation of the number of classes in a finite population. A class here is equivalent to a valid signature. Bunge and Fitzpatrick (1993) provided a review of applications and techniques proposed to estimate the number of classes in finite and infinite populations. Goodman (1949) showed that the linear unbiased estimator for the total number of classes in a finite population is unique under the assumption that the sample size is no smaller than the maximum number of elements in any class. Recently, Haas and Stokes (1998) proposed non-linear estimators based on the generalized jackknife technique. Following Goodman's approach, we consider a linear unbiased estimator for the number of distinct valid signatures in the petition. Several other linear estimators and one non-linear are also considered. In Section 2 we introduce terminology and notation pertinent to our problem. The estimators are described in Section 3. Expressions for the variance of the linear estimators are also provided. In Section 4 we compare the performance of all estimators, and in Section 5 we give a summary. 7 2.3 Terminology and Notation After petition signatures are collected, the state elections office reviews each sheet and removes all the signature pages obtained that do not satisfy state regulations. This procedure leads to a subset of the total number of signature pages originally collected, which will be subject to a verification procedure. This subset of signatures is called the petition here. Signatures in the petition can be classified as valid (from registered voters) or invalid signatures, for example: illegible writing, and signatures different from the ones contained in the registration records. Let N denote the size ofthe petition, and U and M the unknown number of invalid and distinct valid signatures in the petition, respectively. Let N j be the number of times the jth distinct valid signature appears in the petition, j = 1, ... ,M. Therefore, the jth distinct valid signature has N j - 1 duplicated signatures in the petition, j = 1, ... , M. We denote by D the total number of duplicates (replicates) of valid signatures in the petition, which can be expressed as, D M = 2:(Nj - 1). j=1 Note that 'duplicate' is used here to describe all signatures by an elector after his or her first signature. Also, Fi is the number of electors with i valid signatures in the petition, i = 1, ... ,N. Observe that 0 ::; Fi ::; M, so that M Fi = 2: I (Nj = i) (1) j=1 where I (.) denotes the indicator function. Based on Equation (1), we obtain expressions for N andM N N = U + 2:iFi i=1 (2) 8 (3) From Equations (2) and (3) we can rewrite D as D N = I)i i=2 (4) l)Fi. Assume a sample of n signatures is drawn at random without replacement from the petition. Let u be the observed number of invalid signatures in the sample and Ii be the number of electors in the sample with i valid signatures. Then n can be written as n= u n + ~iIi. i=l 2.4 Theoretical Background From Equations (2), (3), and (4) we have M=N-U-D (5) Since N is known, an estimator for M can be obtained by determining estimators for U and D. Since an unbiased estimator for U under simple random sampling design is given by fJ = ~ u, our problem reduces to the estimation of D. 2.4.1 Estimators for D First, the form of the unbiased estimator, Dunbias, for D is determined. Let k = max( N 1 , ... , N M ). Suppose a sample of n (n 2: k) signatures is drawn without (l)( N-j) replacement from a petition of size N. Define Pij = i (~y and, 9 and Cj j-1 p. i=2 Pii for j = 3,4, ... , n. = (j - 1) - :LCi~' Then, an unbiased estimator of D is given by, '" n . -",c;i D unbtas - D p. i· (6) i=2 " The proof of this result is given in the Lemma 1 of the Appendix. Observe that the expansion factors, ;;,., for ii, can take positive or negative values. These expansion " factors can be very large in absolute value, depending on the petition and sample sizes. As a result the estimator Munbias, obtained by using Dunbias in Equation (5), can be unreasonable. To avoid this difficulty, we consider alternative linear estimators, which ignore the valid signatures appearing more than two or three times in the sample, '" N(N - 1) D2 = n(n-1) 12, D3 = N(N - 1) 12 n(n - 1) Goodman (1949) proposed (7) _ N(N - 1)(N - 3n + 4) n(n - 1)(n - 2) h (8) D2 for estimating the number of duplicates of classes in a finite population. The next estimator considered is used by the Washington Elections Division Office I , '" _ N(N - 1) i D2+- n(n-1) Notice that 2+ where h+ = Lk n (9) i=2 h+ is the number of electors in the sample with valid signatures appearing two or more times. A more intuitive estimator is one that replaces h+ by the total number of duplicates in the sample Dd = I Pamela N(N - 1) d n(n - 1) n where d = :L(i - 1)ii. (10) i=2 Floyd, Elections Division, Voter Registration Services, Office of Secretary of State, telephone interview, February 9, 1999. 10 Note that if at most pairs of valid signatures occur in the petition (Fj = 0 for j ~ 3) then the estimators (7-10) are equal to the unbiased estimator, 15unbias. Similarly, 15 3 = 15unbias when at most triplicate valid signatures occur in the petition (~ = 0 for j ~ 4). When prior information is available, it may be possible to reduce the bias of the estimators by incorporating a bias adjustment factor (BAF), denoted as function of q, k, and r, where q = n/ N is the sampling fraction, r = B?'k,r, which is a (T3, T4,"" Tk) with Ti = Fd F2 for i = 3, ... , k, and k is the maximum number of times any valid signature appears in the petition, k = max{j: F j Dk _ BD E(15lq, k) q,k,r - where Dk = F2 > O}. The BAF for 15 is defined as, + 2F3 + ... + (k - l)Fb and E(15lq, k) denotes the expectation of 15 given q, and k. The BAF is approximated using binomial sampling with Pij = U)qi(l - q)j-i in Equation (A.2) of the Appendix, k 1+2:: (i-l)Ti i=3 k 1+~2::i(i-l)(1-q);T 2(I-q) i~3 ' k BD q,k,r - 1+2::(i-l)Ti i=3 k l+~ 2::(i+(1+(i-l)q)(1-q)i-l )Ti for 15 = 152+ q i=3 k l+I:(i-l)T, k i=3 l+~ 2::(iq-1+(1-q)i)T; q i=3 Then, the adjusted estimator of D is --- Dadj = --D D Bq,k,r where 15 is any of the biased estimators defined in Equations (7), (9), and (10). The binomial approximation give values ofE(15lq, k) which are very similar to those obtained using the exact distribution, when Nand n are large. The binomial sampling (11 ) 11 approximation was also used by Goodman (1949), and Haas and Stokes (1998). Observe that the population values k and r are unknown and need to be specified using prior information. In some states, including Washington, duplication data from previous fully verified petitions might be used. 2.4.2 Estimators for M Estimators for M can be obtained by substituting in Equation (5) any of the estimators for D presented in Equations (6-11) M= N - t fJ - iJ iJ = ELAdi with (12) i=2 for constants E, t, and Ai, with I E= { ~ EDq,k,r for iJ = iJunbias, iJ 3, iJ 2, iJ 2+, iJd for the adjusted estimators. In petitions, the coefficient of variation for the number of times valid signatures appear in the petition is expected to be small. The square of this coefficient of variation is M (1/ M)'2:)Nj , 2 j=l - N)2 = -----::---- N 2 A second-order jackknife estimator, for applications where ,2 _ where N = M (1/ M) LNj j=l N-U = -­ M Muj2 , was recommended by Haas and Stokes (1998) is relatively small. The following estimator is a modification of the Haas and Stokes second-order jackknife estimator to accommodate the additional class of invalid signatures 12 M j2m = ( 1 - !I (1 - n* U !I (1 - q*) ) -1 ( n' q*) In( 1 - q* yy2 (Muj \ ) ~!i - ------q-*------'--~ ) where n * = n - u is the reduced sample size obtained after removing all invalid signatures, and N* = N - U is an unbiased estimator of the number of valid signatures in the petition and q* = n* / N* MUJ\. =(1 jy2(M) = 'J/J) ~f' (l- Q n' ~ ~ i=l max (0, ~ ~i(i ~=1 l)!i + f!. - 1). 2.4.3 Expectation and Variance of M The expected value and variance for any estimator of the general form given in Equation (12) is obtained as t n E(M) = N - U - BL:Ai L:PijFj i=2 j=i Var(M) = Var(U) + Var(D) + 2Cov(U, D) where '" , D"') = - -'""'A·'""' EUn ~t ~~n ('N . ) Po~J F J Cov (U ~ -In N-J i=2 j=i (13) (14) 13 (1 + Vijkl (P'j.ij - Pi; )FJ - Pij.ij) P.jFj for ( (Pkj.ij (Pkl.ij - D .. _ r~J- ( ij) (N-j) n-i (~)' P kj ) F j - Pkj.i j ) PijFj n rkl.ij = ~ k,j ~ I for i oF k, j = for j oF Pkl)PijFjF} an d i I I ( k1)(N-j-l) n-t-k (~-f) The expression for the expected value of the linear estimator M follows from Equation (12), the unbiasedness of U, and Equation (A.2) of the Appendix. The expression for Var(U) is well known, and the expressions for Var(15) and Cov(U,15) are derived in the Appendix. 2.5 Performance of the Estimators In this section, the estimators for the number of distinct valid signatures, M, are compared with regard to their bias and root mean squared error (RMSE) for four fully­ verified Washington State petitions, denoted as A, B, C, and D. In Washington, ifthe random sample indicates that M attains the required number then the measure is certified. Otherwise, complete verification of the petition is required. Table 2.1 describes the four petitions with regard to: petition size (N), numbers of invalid signatures (U), duplicates of valid signatures (D), distinct valid signatures (M), the number of electors with i valid signatures in the petition (Fi,), and the squared coefficient of variation, ,,?, for the number of times (Nj ) distinct valid signatures appear in the petition. Also included is the year that each petition was submitted for verification. The petition sizes range from 162,324 to 231,723, the proportion of invalid signatures from 12.0 to 20.4 percent, the duplication rates from 2.0 to 5.6 percent, and the numbers 14 Table 2.1 Description of the petitions A, B, C, and D. A (1984) N 162,324 B (1995) 231,723 C (1989) 173,858 D (1996) 228,148 U (%) 19,437 (12.0) 47,383 (20.4) 31,325 (18.0) 34,542 (15.1) D (%) 4,256 ( 2.6) 4,546 ( 2.0) 9,738 (5.6) 11,584 (5.l) M(%) 138,631 (85.4) 179,794 (77.6) 132,795 (76.4) 182,022 (79.8) H(%) 134,489 (82.9) 175,363 (75.7) 123,205 (71.0) 170,988 (74.9) F 2 (%) 4,031 ( 2.5) 108 (0.07) 4,331 ( 1.9) 93 (0.04) 8,878 (5.l) 385 (0.22) 10,518 (4.6) 489 (0.21) F3(%) F4 3 30 6 F5 3 0 0 F6 22 2 H2 ,,? 0.0296 0.0252 0.0652 0.0584 of distinct valid signatures from 76.4 to 85.4 percent. The petitions C and D with the largest percentage of pairs (F2) also have the largest percentage of triplicates (F3) and quadruples (F4). Only two petitions have electors who signed more than four times, petition B has one elector who signed twelve times and petition D has two electors who signed six times, and three electors who signed five times. For all four petitions, the proportion of electors with triplicates or higher, is small ( < 0.24%). As expected, all four petitions have small values of "'? Table 2.2 displays the expected frequency for replications of distinct valid signatures in the sample for each sampling fraction and petition. For sampling fractions 3%,5%, and 10%, and all four petitions, the expected number of distinct valid signatures that appear more than twice in a random sample is less than one. When the sampling fraction is increased to 20%, the expected number of triplicate valid signatures exceeds one only for petitions B, C and D, and the expected number of quadruples or higher is less than 0.22. 15 Table 2.2 Expected frequency for replications of valid signatures, E(fi)l. Sampling Fraction 3% 5% 1, A B C D 2 3 >4 < 3.93 0.0032 0.0001 4.22 0.0077 0.0003 9.15 0.0135 0.0001 < 10.91 0.0173 0.0001 < 10.89 0.0149 0.0001 11.67 0.0318 0.0023 25.34 0.0624 0.0002 < 30.20 0.0792 0.0001 2 3 24 < 10% 2 3 24 43.37 0.1188 0.0003 46.34 0.1998 0.0262 100.63 0.4929 0.0030 119.87 0.6216 0.0061 20% 2 3 24 172.02 0.9407 0.0050 183.37 1.1338 0.2149 396.69 3.8479 0.0480 472.15 4.7925 0.0893 n IEUi) = 'LP;jFj. j=i To calculate the bias adjustment factors, B?'k,r, we need to specify k and Ti = FilF2 for i = 3, ... ,k, where q = n/ N. When sampling is used the values of k and Ti, i = 3, ... , k are unknown. Here, we apply a jackknife approach where for each petition, i = A, B, C, D, information from only the remaining three petitions is used to specify values for the unknown k and T3, ... ,Tk. For each petition, the specified value, k, was determined as the maximum of the observed k-values from the other three petitions. Similarly, the specified vector, r , is calculated as the average ofthe known entries for the other three petitions. Table 2.3 gives the true and specified values for k, and r for each petition. 16 Table 2.3 True values of k and r3, ... ,rk, and specified values A k r3 r4 rs r6 r12 True 4 0.0268 0.0007 0 0 0 Note: The entries of r B k and r3,···, fie. D C True True Spec Spec 12 4 12 6 0.0215 0.0389 0.0434 0.0316 0.0014 0.0021 0.0034 0.0014 0.0001 0 0.0001 0 0 0.0001 0 0.0001 0.0002 0 0.0001 0 = (r3, ... ,ru) and r = (f3 , ... ,rI2) not displayed are equal to zero. Spec 12 0.0371 0.0023 0.0001 0.0001 0.0001 Table 2.4 gives values for the bias adjustment factors, True 6 0.0465 0.0465 0.0003 0.0002 0 Bf,k, r' using k = Spec 12 0.0306 0.0018 0 0 0.0001 3 and 12 for each petition, estimator, and sampling fraction (q): 3%, 5%, 10%, and 20%. From Table 2.4, we can see that the values of the BAF corresponding to r = f3 and r= (f3, ... ,r12) are similar in all cases. Therefore, we consider only bias adjustment based on triplicate valid signatures, r = f3, hereafter. For each linear estimator, we use Equations (13) and (14) to compute the bias and root mean squared error (RMSE) Bias(M) = E(M) - For the nonlinear estimator, M and RMSE = Jvar(M) + {Bias(M)}2. Muj2m, we estimate the bias and RMSE from 10,000 independent simulated random samples, drawn without replacement from each petition. 17 B?, Table 2.4 Specified values ofthe bias adjusted factor, k,r' for each petition, adjusted estimator and sampling fraction (q): 3%,5%, 10%, and 20%. D A B C k=3 k = 12 k=3 k = 12 k=3 k = 12 k = 3 k = 12 q Estimator 3% 152adj 0.970 0.960 0.968 0.962 0.974 0.967 0.974 0.966 152+adj 0.969 0.958 0.967 0.961 0.974 0.966 0.973 0.965 15dadj 0.968 0.957 0.966 0.960 0.973 0.964 0.972 0.963 152adj 0.971 0.963 0.970 0.965 0.976 0.970 0.975 0.969 152+adj 0.970 0.961 0.969 0.963 0.975 0.967 0.974 0.967 15dadj 0.968 0.958 0.967 0.961 0.973 0.965 0.973 0.964 152adj 0.976 0.971 0.975 0.971 0.980 0.976 0.980 0.976 152+adj 0.973 0.966 0.972 0.967 0.977 0.972 0.977 0.971 15dadj 0.970 0.961 0.969 0.963 0.975 0.967 0.974 0.966 152adj 0.986 0.985 0.986 0.984 0.989 0.988 0.988 0.987 152+adj 0.980 0.976 0.979 0.976 0.983 0.980 15dadj 0.973 0.966 0.972 0.967 0.977 0.972 0.982 0.979 0.977 0.971 5% 10% 20% In Tables 2.5 and 2.6, the bias and RMSE are given for the nine estimators of M for Petitions A-D and sampling fractions: 3%, 5%, 10%, and 20%. For the adjusted estimator, Equation (11), k = 3(r = f3) is used for the bias adjusted factor, B?,k,r' In Table 2.5, the estimator M3 tends to have a relatively small positive bias ( < 0.07%) in all cases. The biases of M2, M2+, and Md are negative in all cases, corresponding to positive biases in the estimators for the number of duplicates of valid signatures 152 ,152+, and 15d . Note that the difference between the bias of these estimators tend to increase as the sampling fraction increases. This is expected since the number of triplicate and quadruple valid signatures increases with sample size (Table 2.2). The three adjusted estimators show a sma11 reduction in the absolute bias when compared with their 18 Table 2.5 BIAS (%) ofestimators for M. Sampling Fraction Estimator 3% MUnbias M3 M2 M2+ Md M 2adj M 2+adj M dadj M uj2m 5% Munbias M3 M2 M2+ Md M 2adj M 2+adj M dadj M uj2m 10% Munbias M3 M2 M2+ Md M 2adj B C D 0 3 (0.00) -106 (-0.08) -110 (-0.08) -113 (-0.08) 0 120 (0.07) -138 (-0.08) -147 (-0.08) -156 (-0.09) 0 27 (0.02) -430 (-0.32) -445 (-0.34) -460 (-0.35) 0 46 ( 0.03) -535 (-0.29) -554 (-0.30) -574 (-0.32) 27 (0.02) 27 (0.02) 28 (0.02) 11 (0.01) 7 (0.00) 3 (0.00) -162 (-0.12) -168 (-0.13) -174(-0.13) -225 (-0.12) -234 (-0.13) -243 (-0.13) -2,536 (-l.83) -2,768 (-l.54) -5,988 (-4.51) -7,541 (-4.14) 0 2 (0.00) -99 (-0.07) -105 (-0.08) -Ill (-0.08) 0 94 (0.05) -122 (-0.07) -136 (-0.08) -150 (-0.08) 0 24 (0.02) -400 (-0.30) -425 (-0.32) -450 (-0.34) 0 42 (0.02) -497 (-0.27) -529 (-0.29) -561 (-0.31) 25 (0.02) 26 (0.02) 28 (0.02) 17(0.01) 12 ( 0.01) 5 (0.00) -150 (-0.11) -160 (-0.12) -170 (-0.13) -208 (-0.11) -222 (-0.12) -237 (-0.13) -2,609 (-l.88) -2,825 (-1.57) -6,004 (-4.52) -7,598 (-4.17) 0 2 (0.00) -81 (-0.06) -93 (-0.07) -105 (-0.08) 0 52 (0.03) -88 (-0.05) -Ill (-0.06) -137 (-0.08) 0 20 (0.02) -325 (-0.24) -375 (-0.28) -425 (-0.32) 0 32 (0.02) -403 (-0.22) -466 (-0.26) -529 (-0.29) 21 (0.02) 26(0.01) -120 (-0.09) -166 (-0.09) M 2+adj 24 (0.02) 26 (0.02) 20(0.01) 11(0.01) -140 (-0.11) -160 (-0.12) -194 (-0.11) -223 (-0.12) M Uj2m -2,346 (-l.69) -2,555 (-l.42) -5,593 (-4.21) -7,054 (-3.88) 0 1 (0.00) -46 (-0.03) -69 (-0.05) -93 (-0.07) 0 18(0.01) -38 (-0.02) -72 (-0.04) -114 (-0.06) 0 13(0.01) -179 (-0.13) -277 (-0.21) -375 (-0.28) 0 20(0.01) -220 (-0.12) -342 (-0.19) -466 (-0.26) 13 ( 0.01) 18 (0.01) 24 (0.01) 27 (0.02) 26 (0.01) 17(0.01) -63 (-0.05) -101 (-0.08) -140 (-0.11) -85 (-0.05) -139 (-0.08) -194 (-0.11) -1,979 (-l.43) -2,139 (-1.19) -4,820 (-3.63) -6,059 (-3.33) Mdadj 20% Petitions A MUnbias M3 M2 M2+ Md M 2adj M 2+adj Mdadj M uj2m 19 non-adjusted counterparts. The non-linear estimator, Muj2m, tends to have a relatively large negative bias ranging from -4.52% to -1.19%. From Table 2.6, it can be seen that the RMSE decreases at a faster rate than 1/ fo for all estimators and petitions. This results from corresponding property ofthe estimators for D in Equation (12). The estimator M3 has smaller RMSE than Munbias, except for the 20% sampling fraction for petition A where the RMSE's are equal. The estimator M2 has smaller RMSE than M3, except for the 20% sampling fraction for petitions is C and D. The estimators M2, M2+, and Md tend to have similar RMSE's for the sample fractions of 3%,5%, and 10% over all four petitions. This is as expected from the form ofthe estimators and the very small expected number of triplicate or higher replications of distinct valid signatures (Table 2.2). For the 20% sampling fraction, the RMSE for slightly larger than the RMSE's for Md is M2 and M2+ for petitions B and C, and similar for petitions A and B. The adjusted estimators M2adj , M2+adj, and Mdadj show a slight reduction in the RMSE compared to their non-adjusted counterparts. These three adjusted estimators have similar RMSE's in all cases. The RMSE for the non-linear estimator Muj2m is relatively large in all cases. 2.6 Summary In this paper we compared several estimators for the number of distinct valid signatures in a petition. Explicit forms for the bias and RMSE were provided for the linear estimators. Simulated random samples were used to estimate the bias and RMSE of the non-linear estimator, Muj2m, adapted from Haas and Stokes (1998). Small sampling fractions less or equal to 10% are typically used for sampling state petitions. For these sample sizes it was difficult to improve much on the Goodman-type estimator M2, which is unbiased when valid signatures are duplicated at most once. This 20 Table 2.6 RMSE (%) of estimators for M. Sampling Fraction Estimator 3% MUnbias M3 M2 M2+ Md 1,147,904,315 3,873 2,510 2,512 2,519 (638,455) (2.15) (1.40) (1.40) (1.40) 7,502 5,217 3,462 3,466 3,476 (5.65) (3.93) (2.61) (2.61) (2.62) D 49,503 5,808 3,791 3,796 3,807 (27.20) ( 3.19) ( 2.08) ( 2.09) ( 2.09) Mdadj 3,354 (2.53) 3,354 (2.53) 3,359 (2.53) 3,669 ( 2.02) 3,670 ( 2.02) 3,675 ( 2.02) M uj2m 3,943 (2.84) 4,290 (2.39) 7,266 (5.47) 8,859 ( 4.87) MUnbias 1,740 1,645 1,423 1,424 1,427 M3 M2 M2+ Md (1.26) (1.19) (1.03) (1.03) (1.03) 47,190,341 1,995 1,585 1,588 1,594 (26,247) (1.11) (0.88) (0.88) (0.89) 3,231 2,693 2,123 2,130 2,142 (2.43) (2.03) (1.60) (1.60) (1.61) 10,607 2,979 2,331 2,341 2,356 ( ( ( ( ( 5.83) 1.64) 1.28) 1.29) 1.29) Mdadj 1,385 (1.00) 1,384 (1.00) 1,384 (1.00) 1,546 (0.86) 1,546 (0.86) 1,549 (0.86) 2,044 (1.54) 2,044 (1.54) 2,049 (1.54) 2,238 ( 1.23) 2,239 ( 1.23) 2,246 ( 1.23) M uj2m 3,353 (2.42) 3,653 (2.03) 6,724 (5.06) 8,326 ( 4.57) 1,232 1,179 1,116 1,133 1,157 1,726 1,294 1,234 1,258 1,290 M2adj M 2+adj MUnbias M3 M2 M2+ Md M 2adj M 2+adj Mdadj M uj2m 20% (2.49) (2.16) (1.66) (1.66) (1.67) C 2,441 (1.36) 2,441 (1.36) 2,445 (1.36) M 2+adj 10% 3,449 2,998 2,307 2,308 2,311 B 2,242 (1.62) 2,241 (1.62) 2,241 (1.62) M 2adj 5% Petitions A Munbias M3 M2 M 2+ Md M 2adj M 2+adj M dadj M uj2m 795 787 753 755 759 (0.57) (0.57) (0.54) (0.54) (0.55) 532,313 928 876 879 887 (296) (0.52) (0.49) (0.49) (0.49) (0.93) (0.89) (0.84) (0.85) (0.87) ( ( ( ( ( 0.95) 0.71) 0.68) 0.69) 0.71) 736 (0.53) 735 (0.53) 735 (0.53) 860 (0.48) 859 (0.48) 861 (0.48) 1,056 (0.80) 1,058 (0.80) 1,065 (0.80) 1,159 ( 0.64) 1,163 ( 0.64) 1,173 ( 0.64) 2,805 (2.02) 3,077 (1.71) 6,053 (4.56) 7,535 ( 4.14) 405 405 404 408 415 (0.29) (0.29) (0.29) (0.29) (0.30) 4,129 498 496 500 509 (2.30) (0.28) (0.28) (0.28) (0.28) 563 561 578 616 671 (0.42) (0.42) (0.44) (0.46) (0.51) 623 613 640 692 767 ( ( ( ( ( 0.34) 0.34) 0.35) 0.38) 0.42) 399 (0.29) 398 (0.29) 398 (0.29) 492 (0.27) 491 (0.27) 491 (0.27) 549 (0.41) 553 (0.42) 565 (0.43) 602 ( 0.33) 611 ( 0.34) 629 ( 0.35) 2,359 (1.70) 2,571 (1.43) 5,211 (3.92) 6,475 ( 3.56) 21 results from the very small probability of observing higher duplicate replication from typical petitions. When duplicate replication data is available from similar fully-verified petitions, it is possible to reduce the bias of the (biased) linear estimators. 2.7 References Bunge, J. and Fitzpatrick, M. (1993), Estimating the Number of Species: A Review, Journal ofthe American Statistical Association, 88, 364-373. Goodman, L. A. (1949), On the Estimation of the Number of Classes in a Population, Annals ofMathematical Statistics, 20, 572-579. Haas, P. J. and Stokes, L. (1998), Estimating the Number of Classes in a Finite Population, Journal ofthe American Statistical Association, 93,1475-1487. Houser, J. (1985), Validating Initiative and Referendum Petition Signatures, Research Monograph, Legislative Research, S420 State Capitol, Salem, Oregon. 22 2.8 Appendix 23 Calculation ofE(D), Var(D), and Cov(U,D) Consider a random sample of n signatures drawn without replacement from a petition of size N. Let 8ja denote the number of valid signatures in the sample from the ath elector who signed j valid signatures in the petition, for a = 1,2, ... ,Fj , and j = 1, ... , N. Note that 8ja has the hypergeometric (N, n, j, i) with ~j = P(8ja = i) given by for i = 0, 1, ... ,j. Similarly, the conditional distribution of 81(3, given 8ja = i, is hypergeometric (N - j, n - i, l, k) with Pk1 .ij TJ rkl.ij = = P(81(3 = k I 8ja = i) ( kI) (N-j-l) n-t-k given by for k = 0, 1, ... , l. (N-j) n-t For the number of electors in the sample with i valid signatures, Ji, write Pj n Ji = 'LJij j=i Jij = 'L 1 (8ja = i), with a=l where Jij is the number of electors with i signatures in the sample and j signatures in the petition (i ::; j) and I (-) is the indicator function. Note that Jij is not observable, but Ji is. Then, n and E(fi) = 'LPijFj. j=i /'- (A. I ) t Thus, from the general form of the linear estimator D = BLAdi we have i=2 E(D) = t n i=2 J=t BLAi L~jFj. (A.2) 24 Lemma 1 Let k = max( N 1 , ••• , Nf,J). Suppose a sample of n (n 2: k) signatures is drawn without replacement from a petition of size N. Define C2 = 1, and Cj = (j - L R j-l 1) - i=2 Ci 2 , ~i for j = 3, 4, ... , n. Then, an unbiased estimator of D is given by (A.3) Proo f. The unbiasedness property for Dunbias follows from substitution of the expectation for fi in Equation (A.3). = = n "LFj(j -1) j = 3,4, ... ,n j=2 N "L(j-1)Fj j=2 since F j = 0, for j = k + 1, k + 2, ... , Nand n 2: k =D. The next result is used for the calculation ofVar(D) and Cov(U,D). Lemma 2 The Cov(fij,ikl) = Vijkl, i ~ j, k ~ l, where (1 + Vijkl = (F';j.ij - l';j)Fj - F';j.ij) l';jFj for i = k,j = ( (Pkj.ij - Pkj ) Fj - Pkj.ij) (Pkl .ij - Pkl ) PijFjPz ~jFj for i =1= k, for j =1= l I j =l 25 Proo I. Substitute (A. 1) in Cov(fij, Ikl) = E(fijIkl) - E(fij)E(fkZ) Pj = Pi L L P(Oja = a=l(3=l i, OZ(3 = k) - (~jFj)(PkIFI) 1. Case where i = k, j = l COV(fij, Iij) Pj =L a=l 2. Case where i Cov(fij, Ikj) P(Oja =I=- ( Pj a=l (3=1 (3o/=a P(Oja = i, OJ(3 = i) - Pj Pj = 0+L L a=l (3=1 (3o/=a P(Oja = i, OJ(3 = k) - (~jFj)(PkjFj) - 1) - ~jPkjFJ (Pkj .ij - P kj ) Fj - Pkj.ij) PijFj . 3. Case where j Cov(fij, hi) =I=- Pj (PijFj k, j = l = ~jPkj.ijFj(Fj = Pj = i) + L L l Pi =L L a=l(3=l P(Oja = (Pkl .ij - Pkl)PijFjF}. = i, 01(3 = k) - (~jFj)(PklF}) ? 26 t ~ The variance for the general fonn of the linear estimator D = B'L-Adi, i=2 t n t n Var(J5) = B2L LAiAk L LVijkl, i=2 k=2 j=i I=k then follows from the covariance of the sums fi = n 'L-fij and fk j=i = n 'L- fkl and Lemma 2. I=k Lemma 3 Under the assumption F j = 0 for j > n, (U) j N P;·F (iNN -_ jn) Cov(u , f··) = ~J Proof. U =n ~J J For fixed i and j write n - 'L-kfk k=l = n - - n =n 1 'L- 'L- kfkl - 1=1k=1 Ifj n n - 'L-k'L-fkl k=l I=k ifij - =n n - 1 j 'L- 'L- kfkl - 'L-kfkj k=1 1=1 k=l Ifj j 'L- kfkj. k=l kii Then, Cov( U, fij) = - n 1 'L- 'L- kCov(ikl, fij) l=lk=1 Ifj j - iVar(fij) - 'L- kCoV(fkl, fij). k=1 kii From Lemma 2, Cov( U, fij) = - ~ ~ k(Pkl ·· - L.. L.. 1=1 k=1 Ifj - .~J Pkl)PFFl ~J J I- ~k((PkPJ ... - L.. k=1 kii i(l + (p ... - p..)p ~J.~J Pk·)p ..)P.P J J - Pk·J.~J ~J J ~J J p,~PJ ... )P.P ~J J 27 -(f t + to (P -(i -t 1=1 k=O k ( Pkl.ij - Pkl) Fl lh k( kN - (1 + (Pij.ij - Pij ) F j - ~j.ij) Pk )) Fj - Pkj;j) ) F';jFj k#i k=O +i kPkj.ij + f (tk=O k (Pk1 .ij - 1=1 Pk1 )) Fl) ~jFj. Using the expectation of hypergeometric distributions then gives the reduction Cov( u, f;)) = - (i -jY:=;) + E('Y:=;) - ~) F.) F';jF) - (iN-jn) N-J (1 - 1.. ~lF.l) p.p. N ~ ~J J 1=1 - C~~jn) (!ft)PijFj. ~ From Lemma 3, the covariance of U = then ~ /f;u and D = t t n i=2 i=2 j=i BL:Adi = BL: L:Adij is 28 Chapter 3 Sampling for Certification of Oregon State Initiative Petitions Ruben A. Smith-Cayama and David R. Thomas 29 3.1 Abstract Statistical sampling for verification of signatures from initiative petitions is used in some states to determine ifthe number of distinct valid signatures, M, in the petition attains a required number, M 2': R. This determination is based entirely on sampling in three states: Michigan, North Dakota, and Oregon (Hauser, 1985). In Oregon, the decision rule is complicated in that multiple stages of sampling are used. A maximum of two submissions of signatures are permitted, with two samples selected sequentially from the first submission and one sample from the second submission. The petition can be accepted after any sampling stage, but only rejected after verification of all samples. The decision rule depends on estimates for the number of distinct valid signatures submitted. This is accomplished by subtracting from the number of signatures submitted estimates for the numbers of invalid signatures (U) and duplicates (D) of valid signatures: M = N - U - D. By a court ruling, the decision rule shall not be biased either for or against the petitioner. In terms of a statistical decision rule, this corresponds to the property that the probability of a correct decision, regarding M 2': R or M < R, should be at least 0.5. The performance of the decision rule is investigated in a variety of situations. Several rates of invalid and duplicates of valid signatures, and sampling fractions are considered for different petition sizes. Cases for both single and multiple duplication of valid signatures are used for single submissions. Only single duplication of valid signatures is considered for two submissions. 3.2 Introduction The Oregon constitution gives citizens the power to propose legislation by initiative petitions. The sponsors of the petition must circulate the complete text of the proposed 30 legislation among voters and collect signatures. Signatures in the petition are classified as either valid or invalid. A valid signature must match that on the voter registration in the county designated on the signature sheet. Although the petition specifies that an elector can sign the petition at most once, some electors sign two or more times. The signatures sheets for a petition are submitted to the Oregon Secretary of State Office, which detennines whether the petition is certified, or not, as a measure on the ballot. A petition should be certified if the total number of distinct valid signatures collected, exceeds a specified required number. For statutory and constitutional initiative petitions, this required number corresponds to six and eight percent of the total votes for the State Governor in the previous election, respectively. Currently, based on the 1998 election the required numbers are 66,786 and 89,048. Certification of a referendum is similar to that for an initiative petition but only signatures of four percent of votes for governor are required. The Secretary of State must give the results of the certification procedure within 15 days after the due date. Ifthe petition or referendum is submitted early then the certification must be completed as early as possible. These time constraints necessitates the use of statistical sampling for selection of signatures to be verified. The signatures are verified in stages. First, a sample of size nl = 1,000 is selected at random from the N signatures submitted in the petition. If the petition is not certified from this first sample then a second sample of size n2 is randomly selected from the remaining N - nl signatures in the petition. The second sample is chosen so that the combined sample size, n = nl + n2, is equal to or greater than five percent of the petition size (n 2: 0.05N). This choice of sample size will satisfy the restriction, imposed by the Oregon administrative rules, that the second sample size, n2, must be larger than the first sample size, n 1. Note that the combined first and second samples satisfy the property of simple random sampling: every possible sample of size n, selected from N, has the same probability of being selected. By Oregon administrative rules a petition cannot be rejected 31 unless both samples have been verified. Further, the statistical decision rule used for determining if the petition contains the required number of distinct valid signatures should not be biased either for or against the petitioner. The decision rule depends on estimates for the number of distinct valid signatures submitted. To estimate the number of distinct valid signatures, estimators for the number of invalid and number of duplicates of valid signatures are needed. For the purpose of constructing a lower limit for the number of distinct valid signatures contained in the petition from the first sample, an upper bound of at least eight percent is assumed for the duplication rate of valid signatures. If verification of the second sample is required, an estimator for the number of duplicates of valid signatures is obtained from the combined first and second samples. If the petition is not certified from this combined sample and no more signatures are submitted before the due date then the petition is rejected. If additional signatures are submitted as a second submission before the due date then a simple random sample of signatures is drawn from the second submission for verification. The certification of the petition is then based on estimates made from the samples in the first and second submissions. We study the performance ofthe decision rule for several situations. The probability of a correct decision is approximated by a multivariate normal distribution model for the joint sampling distribution of the estimators of the numbers of invalid and distinct valid signatures from the multiple stages. Several rates of invalid and duplicates of valid signatures and sampling fractions are considered for different petition sizes. Various duplication rates and distributions of multiple duplication of valid signatures are used for the case of a single submission. Only single duplication is considered for two submissions. Also evaluated are the probabilities of acceptance from the first sample and from the combined first and second samples for two submissions. 32 3.3 The Statistical Decision Problem To describe the statistical estimates used in the Oregon decision rules for certification of an initiative petition, it is convenient to introduce the following notation for counts in a petition. Let U and D denote the unknown numbers of invalid and duplicates (replicates) of valid signatures. Note that 'duplicate' is used here to describe all signatures by an elector after his or her first signature. Then, the unknown number of distinct valid signatures is given by M=N-U-D. (1) The first subtraction, N - U, gives the number of valid signatures, including duplicates, and the second substraction eliminates the duplicates of valid signatures. The objective is to determine, from statistical estimates, if the total number of distinct valid signatures in the petition attains the required number, M ~ R, where R denote the known required number of signatures. A statistical estimator for M can be obtained from Equation (1) by substituting estimators for the number of invalid signatures and for the number of duplicates of valid signatures. Ifthe petition consists of two different submissions (strata) then it is convenient to use the subscript F and S to designate the first and second submissions. For the counts in each submission (F, S), denote by N h , Uh , and Dh the total number of signatures, the unknown numbers of invalid, and duplicates of valid signatures in the hth submission, respectively, for h = F, S. Let D FS be the unknown total number of duplicates of valid signatures between the first and second submissions. Note for the petition comprised of both submissions that the total number of distinct valid signatures can be written as M = MF + Ms - D FS where Mh = Nh - Uh - Dh is the number of distinct valid signatures in the hth submission, for h = F, S. A statistical estimator for M can be obtained from Equation (2) 33 (2) by substituting estimators for the number of distinct valid signatures in each submission and the number of duplicates of valid signatures between the first and second submission. 3.4 One Submission of Signatures Most petitions are filed as a single submission of signatures for Oregon. The decision rule for a single submission is described in Section 3.4.1, an approximation for the probability of correct decision is developed in Section 3.4.2, and numerical results are given in Section 3.4.3. 3.4.1 The Decision Rule The decision rule (1999 Oregon Administrative Rules, Chapter 165) has two components. After verification of the signatures in the first sample the decision alternatives are to either accept (certify) the petition or verify the signatures in the second sample. After the second sample the alternatives are to either accept or reject the petition. The petition is accepted based on the first sample only ifthere is a high level of confidence that M 2: R, where M is defined in Equation (1). This is accomplished by using a lower bound, denoted as MIL, for the number of distinct valid signatures in the petition (3) where UlU = UI + 1.645N (4) 34 UI UI = NUl, is the proportion of invalid signatures in the first sample, and UIU is the 95% level upper confidence limit for the number of invalid signatures. The upper limit for the number of duplicates of valid signatures in the petition DIU = NDIU is obtained by assuming an upper bound for the duplication rate of at least 0.08 (DIU 2: 0.08). If MIL 2: R then the petition is accepted as a measure on the ballot. Otherwise, the second sample must be verified. For petitions that require verification of the second sample, data from the combined sample of size n = nl + n2 are used for estimating the number of distinct valid signatures (5) The unbiased estimator for the number of invalid signatures is N ~ U=-u n (6) where u is the number of invalid signatures in the combined sample. Note that the number of duplicates of valid signatures can be written as D N = I)i -l)Fi i=2 where Fi is the number of electors with i valid signatures in the petition, i = 1, ... ,N. Then, an estimate for D is given by ~ D= where d = n L (i - i=2 N(N - 1) n(n - 1) d 1) Ii and Ii is the number of electors with i valid signatures in the combined sample. Smith-Cayama and Thomas (1999) compared the performance of (7) 35 several estimators for M with respect to the bias and root mean squared error for random sampling from four fully-verified Washington state petitions. The estimator £1, referred to £1 d in their paper, was found to have performance similar to the best estimator in each case. The estimator £1, obtained from the combined first and second samples, is now used for determining the certification of the petition. If £1 2': R, the petition is certified as a measure on the ballot. If no more signatures are submitted before the due date and £1 < R, then the petition is rejected. 3.4.2 Probability of Correct Decision It is of primary interest to determine the probability of making the correct decision as to whether M < R or M 2': R. The petition can only be rejected if from the first sample £1 lL < R and from the combined sample £1 < R. Denote this rejection probability as and the probability of correct decision as for for M <R M> R. (8) To satisfy the constraint that the decision rule should not be biased either for or against the petitioner, the probability of correct decision should be greater or equal to 0.5, that is, PeD 2': 0.5, for all U = NU and D = N D. It can be shown that the events equivalent when £1 lL < Rand fJ 1 > kl for the first sample are 36 A = ( 1.645 ) h were 2 N-nl N(nl-I) B + 1, = 1- A- 2VC, and C = (1 - DIU - R/N)2, so that Using the bivariate nonnal approximation for the joint sampling distribution of Uland M, the rejection probability is PR = 1 00 [boo ¢(Zl, z2)dz2 dz l (9) where ¢(Zl' Z2) is the standardized bivariate nonnal with correlation coefficient Cov(UI,M) PI2 = SD(U I)SD(M)' (10) and the standardized lower and upper integration limits are a= kl - U SD(Ut) and b= R - E(M) -" SD(M) (11) where ( N-nl) U(I-U) nl N-I - U (12) N2(N I) n n - ~(i -1)~(P.·11 n(n-I) !--t ~. f). f=2 and P.kl· . .1) - )=f - p..)F1)) (13) ( kI) (N-J-l) n-j-k (Nn-, J) It is known that the distribution of U1 is asymptotically nonnal. In a simulation study we found that the marginal distribution of M appeared to be approximately nonnal. 37 The derivation of Cov( UI,M) is given in Section 3.8.1 in the Appendix. Expressions for E(M) and Var(M) are included in Smith-Cayama and Thomas (1999), who consider a more general class of linear estimators, E(M) = N - U - N(N - 1) t ( i - 1) tPijFj n(n - 1) i=2 j=i (14) and Var(M) = Var(U) + Var(15) + 2Cov(U, 15) (15) where Var(U) = ':: (~=~ )u (1 - U) t; ") _ {N(N-l)}2 n n . n(n-l) t:2(z - 1)(k - Var ( D n ijkl 1)];n "ft cov(U , 15) = - N(N-l) (iN-j,,:) Fn(n-l) Q.~(i n ~ - 1)~ ~. N-J p. tJ J t=2 J=t (1 + and Vijkl = (Pi;.,; - Pi;) F; - Pi;'i;) P;;F; for (( Pkj.ij - Pkj ) Fj - Pkj.ij) PijFj (Pkl.ij - Pkl)~jFjFl i ~ k,j ~ I for i =1= k, j = 1 for j =1= 1 It is also of interest to calculate the probability of acceptance from the first sample, PIA = P(M IL ~ R) = P(U I ~ kt). For this calculation, we use a normal approximation for the sampling distribution of UI PIA = <P(a) where <P is the standard normal distribution and the standardized value a is given in Equation (11). (16) 38 3.4.3 Numerical Results We calculated the probabilities that the petition will be accepted from the first sample (PIA) and that the correct decision for certification will be made (PCD ). The rejection probabilities (PR ) in Equation (9) were computed using the CDFBVN function of the GAUSS (1992) software. From Equations (9-15) we can see that PCD depends on the size of the first sample, nl; the upper limit for the rate of duplicates of valid signatures for the first sample, DIU; the size ofthe combined sample, n; the petition size, N; the number of invalid signatures in the petition, U; the required number of distinct valid signatures, R; and the number of electors with i valid signatures in the petition, fix ni = 1,000; DIU Fi, for i = 1, ... ,N. We = 0.08 and R = 89,048; corresponding to constitutional petitions for the years 2000 and 2002; three typical petition sizes, N = 110,000; 120,000; and 130,000; and vary the other parameters. 3.4.3.1 Sin2le Duplication of Valid Simatures We first consider the case where valid signatures are duplicated at most once, Fi = 0, for i ~ 3. That is, valid signatures are either unique or occur in pairs. Then the number of duplicates of valid signatures is D = F2 and the estimator reduces to -" N(N - 1) D= d n(n -1) where d = h is the number of duplicates of valid signatures in the combined sample. In the case of single duplication, the sampling distribution moments which depend on duplication, reduce to: Cov(UI,M) = - E(M) = M N2(~ - I)U[(~-~) - ;!?2] (17) 39 Var(D) = .--... N2(N-I) n(n-I) .--... _ Cov(U, D) - - 2 D(1 _ n(n-I) D (N-l) + (N-2)(N-3) (n-2)(n-3) (ND - 1)) N2(N-n)-n(N-2) U D. These simplified expressions can be used in Equations (10) for the correlation coefficient PI2 and (11) for the upper integration limit b. Table 3.1 displays values for the probability of accepting the petition from the first sample (PIA)' and the probability of making a correct decision (PeD ). For each petition size considered, several values are taken for the number of distinct valid signatures, defined as M = f R, where f = 0.97,0.98,0.99, 1.00, 1.01, 1.02, and 1.03; for the proportion of duplicates of valid signatures D = 0.01, 0.02, 0.04, 0.06, 0.08, and 0.10; and for combined sample size n = 5, 6, 7, 8, 9, and 10 percent of N. The rate of invalid signatures in the petition, U, was then taken to satisfy the equation fR= N(1 - D - U). In Table 3.1, for the cases with f = 1 (M = R) the probability of correct decision (PeD ) is approximately equal to 0.50, except for the cases where D = 8 or 10%. For D = 8% the maximum is PeD = 0.535 for N = n/ N = 5% to 10%. In such cases with f = 110,000 and all sampling fractions 1 and PeD > 0.5 the decision rule is biased in favor of the petitioner. For D = 10%, where the assumed upper bound for the duplication rate for the first sampling stage (DIU = 8%) is exceeded, the bias favoring the petitioner is relatively large and even extends to N = 110,000; n/ N ::; 7% for N For fixed = f = 0.99 when n/ N ::; 9% for 120,000; and n/ N ::; 6% for N = 130,000. f < 1 (M < R) the PeD decreases as D increases (U decreases). For fixed f > 1 (M > R) the PeD first decreases as D approaches the assumed upper bound for duplication rate (DIU) of 8% , and then tends to increase as D exceeds 8%. The increase in the P eD appears to be due to increasing PIA. That is, the probability of accepting from the first sample, PIA, increases as U decreases. Thus, for fixed M < R petitions with 40 Table 3.1 Probabilities of acceptance from the first sample (PIA) and correct decision (PeD ) for single duplication (D = F2)' a. N = 110,000 fa D U P 1A nl = 1,000 n/N = 5% PeD 6% 7% 8% 9% 10% 0.97 1% 0.97 2 0.97 4 0.97 6 0.97 8 0.97 10 20.5% 19.5 17.5 15.5 13.5 11.5 0.000 0.000 0.000 0.000 0.000 0.026 0.999 0.993 0.971 0.945 0.920 0.874 1. 000 0.998 0.988 0.972 0.954 0.911 1.000 1.000 0.995 0.987 0.975 0.937 1. 000 1. 000 0.998 0.994 0.987 0.953 1. 000 1. 000 0.999 0.998 0.994 0.963 1. 000 1. 000 1. 000 0.999 0.997 0.969 0.98 1 0.98 2 0.98 4 0.98 6 0.98 8 0.98 10 19.7 18.7 16.7 14.7 12.7 10.7 0.000 0.000 0.000 0.000 0.001 0.120 0.980 0.951 0.897 0.857 0.825 0.701 0.991 0.974 0.934 0.898 0.868 0.741 0.996 0.987 0.959 0.930 0.903 0.774 0.998 0.994 0.976 0.954 0.931 0.801 0.999 0.997 0.986 0.970 0.952 0.822 1. 000 0.999 0.992 0.982 0.967 0.839 0.99 1 0.99 2 0.99 4 0.99 6 0.99 8 0.99 10 18.9 17.9 15.9 13.9 11. 9 9.9 0.000 0.000 0.000 0.000 0.011 0.363 0.849 0.797 0.737 0.703 0.672 0.414 0.884 0.835 0.775 0.738 0.704 0.434 0.911 0.867 0.808 0.770 0.735 0.453 0.933 0.894 0.838 0.800 0.763 0.471 0.949 0.916 0.865 0.827 0.790 0.489 0.962 0.935 0.888 0.852 0.815 0.505 00 1 00 2 00 4 00 6 00 8 00 10 18.0 17.0 15.0 13.0 11. 0 9.0 0.000 0.000 0.000 0.000 0.062 0.702 0.500 0.500 0.500 0.500 0.535 0.860 0.500 0.500 0.500 0.500 0.535 0.860 0.500 0.500 0.500 0.500 0.535 0.860 0.500 0.500 0.500 0.500 0.535 0.860 0.500 0.500 0.500 0.500 0.535 0.860 0.500 0.500 0.500 0.500 0.535 0.860 1. 01 1 1. 01 2 1. 01 4 1. 01 6 1. 01 8 1. 01 10 17.2 16.2 14.2 12.2 10.2 8.2 0.000 0.000 0.000 0.004 0.228 0.932 0.853 0.799 0.738 0.705 0.762 0.980 0.888 0.837 0.776 0.740 0.787 0.982 0.915 0.870 0.810 0.773 0.810 0.984 0.936 0.897 0.840 0.803 0.832 0.985 0.952 0.919 0.867 0.830 0.852 0.987 0.965 0.937 0.890 0.854 0.871 0.989 1. 02 1 1. 02 2 1. 02 4 1. 02 6 1. 02 8 1. 02 10 16.4 15.4 13.4 11. 4 9.4 7.4 0.000 0.000 0.000 0.029 0.542 0.994 0.983 0.954 0.899 0.864 0.929 0.999 0.993 0.976 0.936 0.904 0.947 0.999 0.997 0.988 0.961 0.935 0.962 1.000 0.999 0.995 0.977 0.957 0.973 1.000 1. 000 0.998 0.987 0.973 0.982 1. 000 1. 000 0.999 0.993 0.983 0.988 1. 000 1. 03 1 1. 03 2 1. 03 4 1. 03 6 1. 03 8 1. 03 10 15.6 14.6 12.6 10.6 8.6 6.6 0.000 0.000 0.002 0.130 0.848 1. 000 0.999 0.994 0.973 0.956 0.990 1. 000 1. 000 0.999 0.989 0.978 0.995 1. 000 1. 000 1. 000 0.996 0.990 0.997 1. 000 1. 000 1. 000 0.999 0.996 0.999 1. 000 1. 000 1. 000 1. 000 0.998 0.999 1. 000 1. 000 1. 000 1. 000 0.999 1. 000 1. 000 1. 1. 1. 1. 1. 1. aM = f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f 2: 1 41 Table 3 .1 (continued) b. N = 120,000 fa D U PIA nl = 1,000 n/N = 5% 6% PeD 7% 8% 9% 10% 0.97 1% 0.97 2 0.97 4 0.97 6 0.97 8 0.97 10 27.0% 26.0 24.0 22.0 20.0 18.0 0.000 0.000 0.000 0.000 0.001 0.040 0.998 0.989 0.963 0.935 0.909 0.850 0.999 0.996 0.983 0.964 0.945 0.889 1. 000 0.999 0.993 0.982 0.968 0.916 1. 000 1. 000 0.997 0.991 0.982 0.934 1. 000 1. 000 0.999 0.996 0.991 0.946 1.000 1.000 1.000 0.998 0.995 0.952 0.98 1 0.98 2 0.98 4 0.98 6 0.98 8 0.98 10 26.3 25.3 23.3 21.3 19.3 17.3 0.000 0.000 0.000 0.000 0.003 0.123 0.970 0.938 0.884 0.844 0.811 0.688 0.985 0.964 0.922 0.886 0.854 0.727 0.993 0.980 0.949 0.919 0.890 0.760 0.996 0.989 0.968 0.944 0.918 0.788 0.998 0.994 0.980 0.962 0.941 0.810 0.999 0.997 0.988 0.975 0.958 0.828 0.99 1 0.99 2 0.99 4 0.99 6 0.99 8 0.99 10 25.5 24.5 22.5 20.5 18.5 16.5 0.000 0.000 0.000 0.000 0.016 0.293 0.828 0.780 0.725 0.693 0.660 0.455 0.862 0.817 0.761 0.727 0.691 0.476 0.889 0.848 0.793 0.758 0.720 0.496 0.912 0.875 0.822 0.786 0.748 0.515 0.930 0.898 0.848 0.813 0.773 0.534 0.944 0.917 0.871 0.837 0.797 0.551 1. 00 1 1. 00 2 1. 00 4 1. 00 6 1. 00 8 1. 00 10 24.8 23.8 21. 8 19.8 17.8 15.8 0.000 0.000 0.000 0.001 0.058 0.536 0.500 0.500 0.500 0.501 0.533 0.780 0.500 0.500 0.500 0.501 0.533 0.780 0.500 0.500 0.500 0.501 0.533 0.780 0.500 0.500 0.500 0.501 0.533 0.780 0.500 0.500 0.500 0.501 0.533 0.780 0.500 0.500 0.500 0.501 0.533 0.780 1. 01 1 1. 01 2 1. 01 4 1. 01 6 1. 01 8 1. 01 10 24.1 23.1 21.1 19.1 17 .1 15.1 0.000 0.000 0.000 0.005 0.165 0.774 0.830 0.781 0.726 0.696 0.735 0.931 0.864 0.818 0.762 0.729 0.760 0.937 0.892 0.850 0.794 0.760 0.785 0.943 0.914 0.877 0.823 0.789 0.807 0.948 0.932 0.899 0.849 0.815 0.829 0.954 0.946 0.918 0.872 0.839 0.848 0.959 1. 02 1 1. 02 2 1. 02 4 1. 02 6 1. 02 8 1. 02 10 23.3 22.3 20.3 18.3 16.3 14.3 0.000 0.000 0.000 0.024 0.362 0.925 0.972 0.940 0.885 0.850 0.891 0.987 0.986 0.966 0.923 0.891 0.917 0.990 0.994 0.981 0.950 0.923 0.938 0.992 0.997 0.990 0.969 0.947 0.955 0.994 0.999 0.995 0.981 0.965 0.968 0.996 0.999 0.997 0.989 0.977 0.978 0.997 1. 03 1 1. 03 2 1. 03 4 1. 03 6 1. 03 8 1. 03 10 22.6 21. 6 19.6 17.6 15.6 13.6 0.000 0.000 0.002 0.082 0.614 0.985 0.998 0.990 0.965 0.944 0.971 0.999 1. 000 0.997 0.984 0.970 0.983 0.999 1. 000 0.999 0.993 0.985 0.990 1. 000 1.000 1. 000 0.997 0.993 0.995 1. 000 1. 000 1. 000 0.999 0.997 0.998 1.000 1. 000 1. 000 1. 000 0.999 0.999 1. 000 aM = f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f 2 1 42 Table 3 .1 (continued) c. N = 130,000 fa D U P 1A nl = 1, 000 PeD n/N = 5% 6% 7% 8% 9% 10% 0.97 1% 0.97 2 0.97 4 0.97 6 0.97 8 0.97 10 32.6% 31. 6 29.6 27.6 25.6 23.6 0.000 0.000 0.000 0.000 0.001 0.051 0.996 0.985 0.956 0.926 0.899 0.830 0.999 0.994 0.978 0.958 0.936 0.869 1. 000 0.998 0.990 0.977 0.961 0.898 1. 000 0.999 0.996 0.988 0.977 0.917 1. 000 1. 000 0.998 0.994 0.987 0.930 1. 000 1. 000 0.999 0.997 0.993 0.938 0.98 1 0.98 2 0.98 4 0.98 6 0.98 8 0.98 10 31. 9 30.9 28.9 26.9 24.9 22.9 0.000 0.000 0.000 0.000 0.005 0.129 0.961 0.927 0.872 0.833 0.799 0.675 0.978 0.955 0.911 0.875 0.842 0.713 0.988 0.973 0.939 0.909 0.878 0.745 0.994 0.984 0.960 0.935 0.907 0.773 0.997 0.991 0.974 0.954 0.930 0.796 0.998 0.995 0.984 0.969 0.948 0.815 0.99 1 0.99 2 0.99 4 0.99 6 0.99 8 0.99 10 31. 2 30.2 28.2 26.2 24.2 22.2 0.000 0.000 0.000 0.000 0.019 0.268 0.811 0.767 0.715 0.685 0.651 0.466 0.845 0.802 0.750 0.717 0.681 0.487 0.872 0.833 0.781 0.747 0.709 0.508 0.895 0.860 0.809 0.775 0.735 0.527 0.914 0.883 0.835 0.801 0.760 0.545 0.930 0.902 0.858 0.825 0.783 0.563 1. 00 1 1. 00 2 1. 00 4 1. 00 6 1. 00 8 1. 00 10 30.5 29.5 27.5 25.5 23.5 21. 5 0.000 0.000 0.000 0.001 0.056 0.461 0.500 0.500 0.500 0.501 0.532 0.743 0.500 0.500 0.500 0.501 0.532 0.743 0.500 0.500 0.500 0.501 0.532 0.743 0.500 0.500 0.500 0.501 0.532 0.743 0.500 0.500 0.500 0.501 0.532 0.743 0.500 0.500 0.500 0.501 0.532 0.743 1. 01 1 1. 01 2 1. 01 4 1. 01 6 1. 01 8 1. 01 10 29.8 28.8 26.8 24.8 22.8 20.8 0.000 0.000 0.000 0.006 0.138 0.669 0.813 0.768 0.716 0.688 0.719 0.895 0.846 0.803 0.750 0.720 0.745 0.904 0.874 0.834 0.782 0.750 0.769 0.912 0.897 0.861 0.810 0.778 0.792 0.920 0.916 0.884 0.836 0.803 0.813 0.928 0.931 0.903 0.859 0.827 0.833 0.935 1. 02 1. 02 1 2 4 6 8 10 29.1 28.1 26.1 24.1 22.1 20.1 0.000 0.000 0.000 0.020 0.282 0.838 0.963 0.929 0.873 0.838 0.868 0.970 0.980 0.957 0.912 0.879 0.898 0.976 0.989 0.974 0.940 0.912 0.922 0.981 0.994 0.985 0.961 0.938 0.942 0.986 0.997 0.992 0.975 0.957 0.958 0.989 0.999 0.995 0.984 0.971 0.970 0.992 1. 03 1 1. 03 2 1. 03 4 1. 03 6 1. 03 8 1. 03 10 28.4 27.4 25.4 23.4 21.4 19.4 0.000 0.000 0.002 0.060 0.478 0.939 0.996 0.986 0.957 0.933 0.955 0.994 0.999 0.995 0.979 0.962 0.972 0.996 1. 000 0.998 0.990 0.980 0.984 0.998 1. 000 0.999 0.996 0.990 0.991 0.999 1. 000 1.000 0.998 0.995 0.995 0.999 1. 000 1. 000 0.999 0.998 0.998 1. 000 1. 02 1. 02 1. 02 1. 02 aM = f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f 2: 1 43 higher duplication rates are favorable to the petitioner, and for fixed M > R higher duplication rates tend to be unfavorable as the duplication rate D approaches 8%. 3.4.3.2 Multiple Duplication of Valid Signatures To calculate the probability of correct decision we constructed petitions where the number of electors with i valid signatures in the petition, Fi" is given as an integer approximation to the product of i~1 and the logarithmic series probability Fi ~ ei - 1 - D----=---(i - 1)2In(1 - e) fori = 2,3,4,5,6 and e= 0+,0.05,0.10,0.20 6 so that number of duplicates D = ~ (i - 1) Fi . This model was found to be in rough i=2 agreement with four fully-verified Washington State petitions. For this model higher order replication, FdF2 for i = 3, ... ,6, increases with e. Similar to single duplication of valid signatures, the number of distinct valid signatures for each petition was obtained as M = f R, for f = 0.99, 1.00, and 1.01. To interpret the effect of multiple duplication on PCD, the bias and standard deviation of the estimator £1 are also calculated. Note that the upper limit b in Equation (11) can be written as b= (R - M) - bias(M) ........ SD(M) Table 3.2 displays the probabilities of acceptance from the first sample (PtA) and of correct decision (PCD ), the correlation coefficient between the estimator of the number of invalid signatures from the first sample (U 1) and the estimator of the number of distinct valid signatures (£1), the bias and standard deviation of the estimator £1, and the bivariate normal integration limits a and b in Equation (9) for single (e = 0+) and multiple (e > 0) duplication. Tables 3.2 corresponds to the petition size N = 110,000 44 and 130,000 and combined sample size n = 5 and 10 percent of N. The 48 petitions in each table correspond to f = 0.99, 1.00, and 1.01; D = 0.01,0.02,0.04 and 0.08, and () = 0+,0.05,0.10, and 0.20. Also included are the Fi , for i = 2, ... ,6 and the number of 6 electors who contribute one or more duplicates (F2+ = EFi ) corresponding to fixed i=2 values for D and (). Note that when () = 0+, valid signatures are duplicated at most once, For increasing () and fixed D and f, the number of electors contributing duplicates (F2+) decreases, the correlation (p) between f) I and M is approximately constant, the bias of M becomes more negative and the standard deviation of M increases slightly compared to the bias. As a result, the upper limit b increases corresponding to increasing PCD for f < 1 and to decreasing PCD for f 2: 1. For increasing D and fixed f and (), the correlation (p) between f) I and M increases, the bias of M (except for () = 0+) decreases, and the standard deviation of M increases. The corresponding upper limit b decreases for f < 1 and increases for f 2: 1. Also note that the lower limit a decreases with U (increases with D), so that the probability of accepting from the first sample (PtA) decreases with U. For f = 1 (M = R) and () > O+, the PCD is smaller than 0.5 except for D = 8%, and () = 0.05 in Table 3.2.a with N = N 130,000 and = except for f = 110,000 and n/ N = 5%, and Table 3.2.c with n/ N = 5%. For fixed () > 0+ and f 1.01, D = 8%, in Table 3.2.a with N = i=- 1, the PCD decreases with D 110,000 and n/ N = 5%. 45 Table 3.2 Probabilities of acceptance from the first sample (PIA) and of correct decision (PeD ) with the correlation coefficient for VI and M, bias and standard deviation of the estimator M, and the bivariate normal integration limits a and b in Equation (9) for single (() = 0+) and multiple (() > 0) duplication. a.N= 110,000 and n/N = 5% 1" u 75 () F2 F3 F4 F5 F6 ~ M b a P 'A p Bias (At) M ----v- b PCD SDIAf) 0.99 0.99 0.99 0.99 18.9% 18.9 18.9 18.9 1% 1 1 1 0+ 0.05 0.10 0.20 1100 1072 1045 986 0 14 26 49 0 0 1 4 0 0 0 1 0 0 0 0 1. 25% 1. 23 1. 22 1.18 -7.58 -7.58 -7.58 -7.58 0.000 0.000 0.000 0.000 -0.27 0.00% -0.27 -0.02 -0.27 -0.03 -0.26 -0.07 0.98% 0.98 0.98 0.99 1. 03 1. 04 1. 06 1. 09 0.849 0.852 0.855 0.862 0.99 0.99 0.99 0.99 17.9 17.9 17.9 17.9 2 2 2 2 0+ 0.05 0.10 0.20 2200 2145 2088 1971 0 26 53 99 0 1 2 9 0 0 0 1 0 0 0 0 2.50 2.46 2.43 2.36 -6.91 -6.91 -6.91 -6.91 0.000 0.000 0.000 0.000 -0.20 0.00 -0.20 -0.03 -0.20 -0.06 -0.20 -0.14 1. 22 1. 22 1. 23 1. 24 0.83 0.85 0.87 0.92 0.797 0.803 0.809 0.822 0.99 0.99 0.99 0.99 15.9 15.9 15.9 15.9 4 4 4 4 0+ 0.05 0.10 0.20 4400 4289 4177 3944 0 54 104 197 0 1 5 18 0 0 0 2 0 0 0 0 4.99 4.93 4.86 4.72 -5.51 -5.51 -5.51 -5.51 0.000 0.000 0.000 0.000 -0.14 0.00 -0.14 -0.06 -0.14 -0.13 -0.14 -0.28 1. 59 1. 60 1. 61 1. 63 0.63 0.67 0.71 0.79 0.737 0.748 0.760 0.785 0.99 0.99 0.99 0.99 11. 9 11. 9 11.9 11. 9 8 8 8 8 0+ 0.05 0.10 0.20 8800 8578 8351 7886 0 108 209 394 0 2 9 35 0 0 1 4 0 0 0 1 9.98 9.86 9.72 9.44 -2.29 -2.29 -2.29 -2.29 0.011 0.011 0.011 0.011 -0.08 0.00 -0.08 -0.12 -0.08 -0.26 -0.08 -0.57 2.15 2.16 2.18 2.21 0.47 0.52 0.58 0.7l 0.672 0.691 0.711 0.753 1. 00 1. 00 1. 00 1. 00 18.0 18.0 18.0 18.0 1 1 1 1 0+ 0.05 0.10 0.20 1100 1072 1045 986 0 14 26 49 0 0 1 4 0 0 0 1 0 0 0 0 1. 24 1. 22 1. 20 1.l7 -7.04 -7.04 -7.04 -7.04 0.000 0.000 0.000 0.000 -0.26 0.00 -0.26 -0.01 -0.26 -0.03 -0.26 -0.07 0.96 0.96 0.97 0.98 0.00 0.02 0.03 0.07 0.500 0.494 0.487 0.47l 1. 00 1. 00 1. 00 1. 00 17.0 17.0 17.0 l7.0 2 2 2 2 0+ 0.05 0.10 0.20 2200 2145 2088 1971 0 26 53 99 0 1 2 9 0 0 0 1 0 0 0 0 2.47 2.44 2.41 2.34 -6.35 -6.35 -6.35 -6.35 0.000 0.000 0.000 0.000 -0.20 0.00 -0.20 -0.03 -0.20 -0.06 -0.20 -0.14 1. 20 1. 21 1. 21 1. 23 0.00 0.03 0.05 0.11 0.500 0.490 0.479 0.455 1. 00 1. 00 1. 00 1. 00 15.0 15.0 15.0 15.0 4 4 4 4 0+ 0.05 0.10 0.20 4400 4289 4177 3944 0 54 104 197 0 1 5 18 0 0 0 2 0 0 0 0 4.94 4.88 4.81 4.67 -4.91 -4.91 -4.91 -4.91 0.000 0.000 0.000 0.000 -0.14 0.00 -0.14 -0.06 -0.14 -0.13 -0.14 -0.28 1. 57 1. 58 1. 59 1. 61 0.00 0.04 0.08 o.l7 0.500 0.485 0.468 0.431 1. 00 1. 00 1. 00 1. 00 11. 0 11.0 11.0 11.0 8 8 8 8 0+ 0.05 0.10 0.20 8800 8578 8351 7886 0 108 209 394 0 2 9 35 0 0 1 4 0 0 0 1 9.88 9.76 9.62 9.34 -1. 54 -1. 54 -1. 54 -1. 54 0.062 0.062 0.062 0.062 -0.08 0.00 -0.08 -0.12 -0.08 -0.26 -0.08 -0.57 2.l3 2.14 2.16 2.19 0.00 0.06 0.12 0.26 0.535 0.5l3 0.490 0.439 1. 01 1. 01 1. 01 1. 01 17.2 17.2 l7 .2 l7 .2 1 1 1 1 0+ 0.05 0.10 0.20 1100 1072 1045 986 0 14 26 49 0 0 1 4 0 0 0 1 0 0 0 0 1. 22 1. 21 1.19 1.16 -6.49 -6.49 -6.49 -6.49 0.000 0.000 0.000 0.000 -0.26 0.00 -0.26 -0.01 -0.26 -0.03 -0.26 -0.07 0.94 0.95 0.95 0.96 -1. 05 -1. 03 -1. 01 -0.96 0.853 0.848 0.843 0.830 1. 01 1. 01 1. 01 1. 01 16.2 16.2 16.2 16.2 2 2 2 2 0+ 0.05 0.10 0.20 2200 2145 2088 1971 0 26 53 99 0 1 2 9 0 0 0 1 0 0 0 0 2.45 2.41 2.38 2.31 -5.78 -5.78 -5.78 -5.78 0.000 0.000 0.000 0.000 -0.20 0.00 -0.20 -0.03 -0.20 -0.06 -0.19 -0.14 1.18 1.19 1. 20 1. 21 -0.84 -0.81 -0.78 -0.70 0.799 0.790 0.781 0.759 1. 01 1. 01 1. 01 1. 01 14.2 14.2 14.2 14.2 4 4 4 4 0+ 0.05 0.10 0.20 4400 4289 4177 3944 0 54 104 197 0 1 5 18 0 0 0 2 0 0 0 0 4.89 4.83 4.77 4.63 -4.28 -4.28 -4.28 -4.28 0.000 0.000 0.000 0.000 -0.14 0.00 -0.14 -0.06 -0.13 -0.13 -0.13 -0.28 1. 55 1. 56 1. 57 1. 59 -0.64 -0.60 -0.55 -0.45 0.738 0.724 0.709 0.673 1. 01 1. 01 1. 01 1. 01 10.2 10.2 10.2 10.2 8 8 8 8 0+ 0.05 0.10 0.20 8800 8578 8351 7886 0 108 209 394 0 2 9 35 0 0 1 4 0 0 0 1 9.78 9.66 9.53 9.25 -0.75 -0.75 -0.75 -0.75 0.228 0.228 0.228 0.228 -0.08 0.00 -0.08 -0.12 -0.08 -0.25 -0.08 -0.56 2.10 2.12 2.l3 2.17 -0.47 -0.41 -0.34 -0.20 0.762 0.746 0.727 0.684 f < 1 and accept for f :::: 1 = F2 + H + F4 + F5 + F6 is the number of the !'vI electors who contribute one or more duplicates aM = f R (R = 89,048) and correct decision are: reject for b F2+ 46 Table 3.2 (continued) b.N= 110,000 and n/ N = 10% TJ fa 75 () F2 F3 F4 Fs F6 F, b M a p P'A Bias (p,]) SD(M) M AI b PeD 0.99 0.99 0.99 0.99 18.9% 18.9 18.9 18.9 1% 1 1 1 0+ 0.05 0.10 0.20 1100 1072 1045 986 0 14 26 49 0 0 1 4 0 0 0 1 0 0 0 0 1. 25% 1. 23 1. 22 1.18 -7.58 -7.58 -7.58 -7.58 0.000 0.000 0.000 0.000 -0.22 0.00% -0.22 -0.01 -0.22 -0.03 -0.21 -0.07 0.57% 0.57 0.57 0.58 1. 77 1. 79 1. 81 1. 87 0.962 0.963 0.965 0.969 0.99 0.99 0.99 0.99 l7.9 17.9 17.9 17.9 2 2 2 2 0+ 0.05 0.10 0.20 2200 2145 2088 1971 0 26 53 99 0 1 2 9 0 0 0 1 0 0 0 0 2.50 2.46 2.43 2.36 -6.91 -6.91 -6.91 -6.91 0.000 0.000 0.000 0.000 -0.18 0.00 -0.18 -0.03 -0.18 -0.06 -0.17 -0.13 0.67 0.67 0.67 0.68 1. 51 1. 55 1. 59 1. 68 0.935 0.939 0.944 0.953 0.99 0.99 0.99 0.99 15.9 15.9 15.9 15.9 4 4 4 4 0+ 0.05 0.10 0.20 4400 4289 4177 3944 0 54 104 197 0 1 5 18 0 0 0 2 0 0 0 0 4.99 4.93 4.86 4.72 -5.51 -5.51 -5.51 -5.51 0.000 0.000 0.000 0.000 -0.13 0.00 -0.13 -0.06 -0.13 -0.12 -0.13 -0.27 0.83 0.83 0.84 0.85 1. 22 1. 28 1. 35 1. 50 0.888 0.900 0.9ll 0.933 0.99 0.99 0.99 0.99 1l.9 11.9 11. 9 11. 9 8 8 8 8 0+ 0.05 0.10 0.20 8800 8578 8351 7886 0 108 209 394 0 2 9 35 0 0 1 4 0 0 0 1 9.98 9.86 9.72 9.44 -2.29 -2.29 -2.29 -2.29 0.01l 0.011 0.011 0.01l -0.08 0.00 -0.08 -0.12 -0.08 -0.25 -0.08 -0.54 1. 08 1. 09 1.10 loll 0.93 1. 03 1.14 1. 39 0.815 0.839 0.864 0.907 1. 00 1. 00 1. 00 1. 00 18.0 18.0 18.0 18.0 1 1 1 1 0+ 0.05 0.10 0.20 1100 1072 1045 986 0 14 26 49 0 0 1 4 0 0 0 1 0 0 0 0 1. 24 1. 22 1. 20 1.17 -7.04 -7.04 -7.04 -7.04 0.000 0.000 0.000 0.000 -0.22 0.00 -0.21 -0.01 -0.21 -0.03 -0.21 -0.07 0.56 0.56 0.56 0.57 0.00 0.03 0.05 0.12 0.500 0.490 0.479 0.453 1. 00 1. 00 1. 00 1. 00 17.0 17.0 17.0 17.0 2 2 2 2 0+ 0.05 0.10 0.20 2200 2145 2088 1971 0 26 53 99 0 1 2 9 0 0 0 1 0 0 0 0 2.47 2.44 2.41 2.34 -6.35 -6.35 -6.35 -6.35 0.000 0.000 0.000 0.000 -0.17 0.00 -0.17 -0.03 -0.l7 -0.06 -0.l7 -0.13 0.66 0.66 0.66 0.67 0.00 0.04 0.09 0.20 0.500 0.482 0.464 0.422 1. 00 1. 00 1. 00 1. 00 15.0 15.0 15.0 15.0 4 4 4 4 0+ 0.05 0.10 0.20 4400 4289 4177 3944 0 54 104 197 0 1 5 18 0 0 0 2 0 0 0 0 4.94 4.88 4.81 4.67 -4.91 -4.91 -4.91 -4.91 0.000 0.000 0.000 0.000 -0.13 0.00 -0.13 -0.06 -0.13 -0.12 -0.12 -0.26 0.82 0.82 0.83 0.84 0.00 0.07 0.14 0.31 0.500 0.472 0.442 0.377 1. 00 1. 00 1. 00 1. 00 11.0 11.0 11. 0 11.0 8 8 8 8 0+ 0.05 0.10 0.20 8800 8578 8351 7886 0 108 209 394 0 2 9 35 0 0 1 4 0 0 0 1 9.88 9.76 9.62 9.34 -1. 54 -1. 54 -1. 54 -1. 54 0.062 0.062 0.062 0.062 -0.08 0.00 -0.08 -0.12 -0.08 -0.24 -0.07 -0.53 1. 07 1. 08 1. 08 1.10 0.00 O.ll 0.22 0.48 0.535 0.495 0.451 0.359 1. 01 1. 01 1. 01 1. 01 17.2 17.2 17.2 17.2 1 1 1 1 0+ 0.05 0.10 0.20 1100 1072 1045 986 0 14 26 49 0 0 1 4 0 0 0 1 0 0 0 0 1. 22 1. 21 1.19 1.16 -6.49 -6.49 -6.49 -6.49 0.000 0.000 0.000 0.000 -0.21 0.00 -0.21 -0.01 -0.21 -0.03 -0.21 -0.07 0.55 0.55 0.55 0.56 -1. 81 -1. 78 -1. 74 -1. 66 0.965 0.962 0.959 0.952 1. 01 1. 01 1. 01 1. 01 16.2 16.2 16.2 16.2 2 2 2 2 0+ 0.05 0.10 0.20 2200 2145 2088 1971 0 26 53 99 0 1 2 9 0 0 0 1 0 0 0 0 2.45 2.41 2.38 2.31 -5.78 -5.78 -5.78 -5.78 0.000 0.000 0.000 0.000 -0.17 0.00 -0.17 -0.03 -0.17 -0.06 -0.17 -0.13 0.65 0.65 0.65 0.66 -1. 53 -1. 48 -1. 43 -1. 30 0.937 0.931 0.923 0.904 1. 01 1. 01 1. 01 1. 01 14.2 14.2 14.2 14.2 4 4 4 4 0+ 0.05 0.10 0.20 4400 4289 4177 3944 0 54 104 197 0 1 5 18 0 0 0 2 0 0 0 0 4.89 4.83 4.77 4.63 -4.28 -4.28 -4.28 -4.28 0.000 0.000 0.000 0.000 -0.12 0.00 -0.12 -0.06 -0.12 -0.12 -0.12 -0.26 0.81 0.81 0.82 0.83 -1. 23 -1.15 -1. 07 -0.88 0.890 0.875 0.857 0.8ll 1. 01 1. 01 1. 01 1. 01 10.2 10.2 10.2 10.2 8 8 8 8 0+ 0.05 0.10 0.20 8800 8578 8351 7886 0 108 209 394 0 2 9 35 0 0 1 4 0 0 0 1 9.78 9.66 9.53 9.25 -0.75 -0.75 -0.75 -0.75 0.228 0.228 0.228 0.228 -0.07 0.00 -0.07 -0.11 -0.07 -0.24 -0.07 -0.53 1. 06 1. 06 1. 07 1. 09 -0.94 -0.82 -0.70 -0.43 0.87l 0.848 0.820 0.749 = f R (R aM b F2+ = F2 = 89,048) and correct decision are: reject for f + F3 + F4 + Fs + F6 < 1 and accept for f ~ 1 is the number of the M electors who contribute one or more duplicates 47 Table 3.2 (continued) c.N= 130,000 and n/N = 5% f" TJ I5 (j F2 F3 1% 0+ F, F5 F6 ~ b M a P'A p Bias (ii) -A-1- SD (i11) M b PCD 0.99 0.99 0.99 0.99 31. 2% 31.2 31. 2 31.2 1 1 1 0.05 0.10 0.20 1300 1268 1235 1165 0 16 31 58 0 0 1 5 0 0 0 1 0 0 0 0 1. 47% 1. 46 1. 44 1. 39 -6.72 -6.72 -6.72 -6.72 0.000 0.000 0.000 0.000 -0.27 0.00% -0.27 -0.02 -0.27 -0.04 -0.26 -0.08 1.14% 1.15 1.15 1.16 0.88 0.89 0.91 0.94 0.811 0.815 0.818 0.827 0.99 0.99 0.99 0.99 30.2 30.2 30.2 30.2 2 2 2 2 0+ 0.05 0.10 0.20 2600 2533 2467 2330 0 32 62 118 0 1 3 10 0 0 0 1 0 0 0 0 2.95 2.91 2.87 2.79 -6.09 -6.09 -6.09 -6.09 0.000 0.000 0.000 0.000 -0.21 0.00 -0.21 -0.04 -0.21 -0.08 -0.21 -0.17 1. 39 1. 39 1. 40 1. 41 0.73 0.75 0.78 0.83 0.767 0.774 0.781 0.797 0.99 0.99 0.99 0.99 28.2 28.2 28.2 28.2 4 4 4 4 0+ 0.05 0.10 0.20 5200 5069 4935 4661 0 64 123 234 0 1 5 21 0 0 1 2 0 0 0 0 5.90 5.82 5.74 5.58 -4.81 -4.81 -4.81 -4.81 0.000 0.000 0.000 0.000 -0.15 0.00 -0.15 -0.07 -0.15 -0.15 -0.15 -0.33 1. 77 1. 78 1. 79 1. 82 0.57 0.61 0.65 0.74 0.7l5 0.728 0.742 0.770 0.99 0.99 0.99 0.99 24.2 24.2 24.2 24.2 8 8 8 8 0+ 0.05 0.10 0.20 10400 10137 987l 9320 0 127 246 466 0 3 11 41 0 0 1 5 0 0 0 1 11.80 11. 65 11. 49 11.15 -2.08 -2.08 -2.08 -2.08 0.019 0.019 0.019 0.019 -0.10 0.00 -0.10 -0.15 -0.10 -0.31 -0.09 -0.67 2.37 2.38 2.40 2.43 0.43 0.49 0.55 0.69 0.651 0.672 0.694 0.740 1. 00 1. 00 1. 00 1. 00 30.5 30.5 30.5 30.5 1 1 1 1 0+ 0.05 0.10 0.20 1300 1268 1235 1165 0 16 31 58 0 0 1 5 0 0 0 1 0 0 0 0 1. 46 1. 44 1. 42 1. 38 -6.29 -6.29 -6.29 -6.29 0.000 0.000 0.000 0.000 -0.27 0.00 -0.27 -0.02 -0.27 -0.04 -0.26 -0.08 1.13 1.13 1.14 1.15 0.00 0.02 0.03 0.07 0.500 0.494 0.487 0.471 1. 00 1. 00 1. 00 1. 00 29.5 29.5 29.5 29.5 2 2 2 2 0+ 0.05 0.10 0.20 2600 2533 2467 2330 0 32 62 118 0 1 3 10 0 0 0 1 0 0 0 0 2.92 2.88 2.84 2.76 -5.66 -5.66 -5.66 -5.66 0.000 0.000 0.000 0.000 -0.21 0.00 -0.21 -0.04 -0.21 -0.08 -0.21 -0.16 1. 37 1. 38 1. 38 1. 40 0.00 0.03 0.05 0.12 0.500 0.489 0.478 0.453 1. 00 1. 00 1. 00 1. 00 27.5 27.5 27.5 27.5 4 4 4 4 0+ 0.05 0.10 0.20 5200 5069 4935 4661 0 64 123 234 0 1 5 21 0 0 1 2 0 0 0 0 5.84 5.77 5.69 5.52 -4.35 -4.35 -4.35 -4.35 0.000 0.000 0.000 0.000 -0.15 0.00 -0.15 -0.07 -0.15 -0.15 -0.15 -0.33 1. 76 1. 76 1. 78 1. 80 0.00 0.04 0.09 0.18 0.500 0.484 0.466 0.428 1. 00 1. 00 1. 00 1. 00 23.5 23.5 23.5 23.5 8 8 8 8 0+ 0.05 0.10 0.20 10400 10137 987l 9320 0 127 246 466 0 3 11 41 0 0 1 5 0 0 0 1 11.68 11. 53 11. 37 11. 04 -1. 59 -1. 59 -1. 59 -1. 59 0.056 0.056 0.056 0.056 -0.10 0.00 -0.10 -0.14 -0.10 -0.30 -0.09 -0.67 2.34 2.36 2.37 2.41 0.00 0.06 0.13 0.28 0.532 0.509 0.484 0.429 1. 01 1. 01 1. 01 1. 01 29.8 29.8 29.8 29.8 1 1 1 1 0+ 0.05 0.10 0.20 1300 1268 1235 1165 0 16 31 58 0 0 1 5 0 0 0 1 0 0 0 0 1. 45 1. 43 1. 41 1. 37 -5.86 -5.86 -5.86 -5.86 0.000 0.000 0.000 0.000 -0.27 0.00 -0.27 -0.02 -0.27 -0.04 -0.26 -0.08 1.11 1.12 1.12 1.13 -0.89 -0.87 -0.85 -0.80 0.813 0.808 0.802 0.788 1. 01 1. 01 1. 01 1. 01 28.8 28.8 28.8 28.8 2 2 2 2 0+ 0.05 0.10 0.20 2600 2533 2467 2330 0 32 62 118 0 1 3 10 0 0 0 1 a 0 0 0 2.89 2.85 2.82 2.73 -5.22 -5.22 -5.22 -5.22 0.000 0.000 0.000 0.000 -0.21 0.00 -0.21 -0.04 -0.21 -0.07 -0.21 -0.16 1. 35 1. 36 1. 37 1. 38 -0.73 -0.70 -0.67 -0.60 0.768 0.758 0.748 0.725 1. 01 1. 01 1. 01 1. 01 26.8 26.8 26.8 26.8 4 4 4 4 0+ 0.05 0.10 0.20 5200 5069 4935 4661 0 64 123 234 0 1 5 21 0 0 1 2 0 0 0 0 5.78 5.71 5.63 5.47 -3.90 -3.90 -3.90 -3.90 0.000 0.000 0.000 0.000 -0.15 0.00 -0.15 -0.07 -0.15 -0.15 -0.15 -0.32 1. 74 1. 75 1. 76 1. 78 -0.57 -0.53 -0.48 -0.37 0.7l6 0.701 0.683 0.646 1. 01 1. 01 1. 01 1. 01 22.8 22.8 22.8 22.8 8 8 8 8 0+ 0.05 0.10 0.20 10400 10137 9871 9320 0 127 246 466 0 3 11 41 0 0 1 5 0 0 0 1 11. 56 11. 42 11.26 10.93 -1. 09 -1. 09 -1. 09 -1. 09 0.138 0.138 0.138 0.138 -0.10 0.00 -0.10 -0.14 -0.09 -0.30 -0.09 -0.66 2.32 2.33 2.35 2.38 -0.43 -0.36 -0.29 -0.14 0.7l9 0.699 0.677 0.624 f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f :2: 1 = F2 + F3 + F4 + Fs + F6 is the number of the M electors who contribute one or more duplicates aM = b F2+ 48 Table 3.2 (continued) d.N= 130,000 and n/ N = 10% fa I5 e V F2 F3 F4 F5 F6 ~ M b a p PIA Bias (lVi) -A-[- SD (1~1) -----x;r- b PeD 0.99 0.99 0.99 0.99 31. 2% 31.2 31.2 31.2 1% 1 1 1 0+ 0.05 0.10 0.20 l300 1268 1235 ll65 0 16 31 58 0 0 1 5 0 0 0 1 0 0 0 0 1. 47% 1. 46 1. 44 1. 39 -6.72 -6.72 -6.72 -6.72 0.000 0.000 0.000 0.000 -0.21 0.00% -0.21 -0.02 -0.21 -0.03 -0.21 -0.08 0.69% 0.69 0.69 0.69 1. 47 1. 49 1. 52 1. 57 0.930 0.932 0.935 0.942 0.99 0.99 0.99 0.99 30.2 30.2 30.2 30.2 2 2 2 2 0+ 0.05 0.10 0.20 2600 2533 2467 2330 0 32 62 ll8 0 1 3 10 0 0 0 1 0 0 0 0 2.95 2.91 2.87 2.79 -6.09 -6.09 -6.09 -6.09 0.000 0.000 0.000 0.000 -0.18 0.00 -0.18 -0.04 -0.18 -0.07 -0.18 -0.16 0.78 0.78 0.79 0.79 1. 29 1. 33 1. 38 1. 47 0.902 0.909 0.916 0.929 0.99 0.99 0.99 0.99 28.2 28.2 28.2 28.2 4 4 4 4 0+ 0.05 0.10 0.20 5200 5069 4935 4661 0 64 123 234 0 1 5 21 0 0 1 2 0 0 0 0 5.90 5.82 5.74 5.58 -4.81 -4.81 -4.81 -4.81 0.000 0.000 0.000 0.000 -0.14 0.00 -0.14 -0.07 -0.14 -0.15 -0.13 -0.31 0.94 0.95 0.95 0.96 1. 07 1.14 1. 21 1. 37 0.858 0.872 0.888 0.915 0.99 0.99 0.99 0.99 24.2 24.2 24.2 24.2 8 8 8 8 0+ 0.05 0.10 0.20 10400 10137 9871 9320 0 127 246 466 0 3 41 0 0 1 5 0 0 0 1 ll.80 11. 65 11. 49 11.15 -2.08 -2.08 -2.08 -2.08 0.019 0.019 0.019 0.019 -0.09 0.00 -0.09 -0.14 -0.09 -0.29 -0.09 -0.64 1. 20 1. 21 1. 22 1. 23 0.84 0.95 1. 07 1. 33 0.783 0.812 0.840 0.891 1. 00 1. 00 1. 00 1. 00 30.5 30.5 30.5 30.5 1 1 1 1 0+ 0.05 0.10 0.20 l300 1268 1235 ll65 0 16 31 58 0 0 1 5 0 0 0 1 0 0 0 0 1. 46 1. 44 1. 42 1. 38 -6.29 -6.29 -6.29 -6.29 0.000 0.000 0.000 0.000 -0.21 0.00 -0.21 -0.02 -0.21 -0.03 -0.21 -0.08 0.68 0.68 0.68 0.68 0.00 0.02 0.05 0.12 0.500 0.490 0.480 0.454 1. 00 1. 00 1. 00 1. 00 29.5 29.5 29.5 29.5 2 2 2 2 0+ 0.05 0.10 0.20 2600 2533 2467 2330 0 32 62 ll8 0 1 3 10 0 0 0 1 0 0 0 0 2.92 2.88 2.84 2.76 -5.66 -5.66 -5.66 -5.66 0.000 0.000 0.000 0.000 -0.18 0.00 -0.18 -0.04 -0.18 -0.07 -0.18 -0.15 0.77 0.77 0.78 0.78 0.00 0.05 0.09 0.20 0.500 0.482 0.463 0.422 1. 00 1. 00 1. 00 1. 00 27.5 27.5 27.5 27.5 4 4 4 4 0+ 0.05 0.10 0.20 5200 5069 4935 4661 0 64 123 234 0 1 5 21 0 0 1 2 0 0 0 0 5.84 5.77 5.69 5.52 -4.35 -4.35 -4.35 -4.35 0.000 0.000 0.000 0.000 -0.14 0.00 -0.14 -0.07 -0.14 -0.14 -0.l3 -0.31 0.93 0.94 0.94 0.95 0.00 0.07 0.15 0.33 0.500 0.471 0.439 0.373 1. 00 1. 00 1. 00 1. 00 23.5 23.5 23.5 23.5 8 8 8 8 0+ 0.05 0.10 0.20 10400 10137 987l 9320 0 127 246 466 0 3 0 0 0 1 ll.68 ll.53 ll.37 11. 04 -1. 59 -1. 59 -1. 59 -1. 59 0.056 0.056 0.056 0.056 -0.09 0.00 -0.09 -0.14 -0.09 -0.29 -0.09 -0.63 1.19 1. 20 1. 20 1. 22 0.00 41 0 0 1 5 0.24 0.52 0.532 0.489 0.443 0.345 1. 01 l. 01 1. 01 l. 01 29.8 29.8 29.8 29.8 1 1 1 1 0+ 0.05 0.10 0.20 l300 1268 1235 ll65 0 16 31 58 0 0 1 5 0 0 0 1 0 0 0 0 1. 45 1. 43 1. 41 l. 37 -5.86 -5.86 -5.86 -5.86 0.000 0.000 0.000 0.000 -0.21 0.00 -0.21 -0.02 -0.21 -0.03 -0.21 -0.08 0.67 0.67 0.67 0.67 -l. 49 -l. 46 -l. 43 -1. 35 0.931 0.928 0.923 0.912 l. 01 2 2 2 2 0+ 0.05 0.10 0.20 2600 2533 2467 2330 0 32 62 ll8 0 1 3 10 0 0 2.89 2.85 2.82 2.73 -5.22 -5.22 -5.22 -5.22 0.000 0.000 0.000 0.000 -0.18 0.00 -0.18 -0.03 -0.18 -0.07 -0.18 -0.15 0.76 0.76 0.77 0.77 -l. 30 l. 01 1. 01 28.8 28.8 28.8 28.8 -1. 25 -1. 20 -1. 08 0.903 0.894 0.885 0.861 l. 01 1. 01 l. 01 l. 01 26.8 26.8 26.8 26.8 4 4 4 4 0+ 0.05 0.10 0.20 5200 5069 4935 4661 0 64 123 234 0 1 5 21 -3.90 -3.90 -3.90 -3.90 0.000 0.000 0.000 0.000 -0.14 0.00 -0.14 -0.07 -0.13 -0.14 -0.13 -0.31 0.92 0.93 0.93 0.94 -1. 07 -l. 00 -0.91 -0.73 0.859 0.841 0.819 0.766 l. 01 l. 01 l. 01 1. 01 22.8 22.8 22.8 22.8 8 8 8 8 0+ 0.05 0.10 0.20 10400 10137 9871 9320 0 127 246 466 -l. 09 0.l38 0.138 0.138 0.138 -0.09 0.00 -0.09 -0.14 -0.09 -0.28 -0.09 -0.62 l.18 l.18 1.19 l. 21 -0.84 -0.72 -0.59 -0.30 0.833 0.803 0.768 0.679 1. 01 aM = f b F2+ = R (R F2 = II II a a a 0 1 0 a a a 1 2 0 0 5.78 5.71 5.63 5.47 0 3 II 0 0 1 41 5 0 0 0 1 ll.56 ll.42 ll.26 10.93 0 -1. 09 -l. 09 -1. 09 89,048) and correct decision are: reject for f < 1 and accept for f :2:: 1 + F3 + F4 + F5 + F6 is the number of the M electors who contribute one or more duplicates o.ll 49 3.5 Two Submissions of Signatures We now consider the special case when a petition is not accepted after sampling from the first submission, and a second submission of additional signatures is submitted for verification before the due date. Stratified random sampling with proportional allocation is used for the second submission. Only single duplication of valid signatures is considered. 3.5.1 The Decision Rule The decision rule (Oregon Administrative Rules Chapter 165, 1999) when the petition consists of two submissions can be described as follows. Following the notation for counts in each submission introduced in Section 3.3, we denote by nh, Uh, and dh the numbers of signatures, invalid signatures, and duplicates of valid signatures, respectively, in the sample from the hth submission, for h = F,S. Let d FS be the number of duplicates of valid signatures between the samples from the first and second submissions. Proportional allocation is used for determining the sample size for the second submission (ns). Then, ns is approximately calculated as the product ofthe combined sample size (nF) for the first submission and the ratio of the submission sizes The estimator for the number of distinct valid signatures in the petition M = MF + Ms - DFS , (18) is obtained from separate estimates from each submission (h = F,S) Mh = Nh - fh - Dh, (19) 50 where and and from the estimate of the number of duplicated valid signatures between the first and second submissions (20) The estimate of £It is then used in the decision: accept the petition if £It 2': R and reject if £It < R. 3.5.2 Probability of Correct Decision Extending the development for a single submission, the petition can only be rejected if MIL < R (or fJ I > k 1), £It F < R, and £It < R. Thus, the probability that the petition will be rejected is The probability of correct decision PeD follows Equation (8) with M = MF + Ms - DFS. Using a trivariate normal approximation for the joint sampling distribution of fJ 1, £It F, and £It results in (21 ) where ¢(Zl' Z2, and P23. Z3) is the standardized trivariate normal density with correlations P12, P\3, The correlation P12 and the standardized upper integration limits a and b were introduced in the case of only one submission, Equations (10) and (11), without the sub index F that denotes first submission. The correlations 51 P13 = Cov(UI,M F) - Cov(UI,D FS ) SD(U.)SD(M) ,P23 = Var(MF) - Cov(MF,D FS ) SD(MF)SD(M) ,(22) and the standardized upper integration limit c= R - E(M) SD(M) (23) must be determined. As noted in Section 3.4.2, M from a single submission has an approximately normal distribution, and hence MF and Ms do. Since duplicate signatures are rare events, the random variable dFs can be regarded as having an approximately Poisson distribution, which for large enough mean is approximately normal. Therefore, we expect DFs, and hence also M for two submissions, to have an approximately normal distribution. It is also of interest to calculate the probability of accepting the petition from the combined first and second samples drawn from the first submission, P2A = 1 - P(UI > kl' MF < R). 3.5.3 Numerical Results We calculate the probabilities of accepting the petition from the first sample (PIA)' accepting from the combined first and second samples (P2A ), and making a correct decision (PeD ) after the second submission is filed. The rejection probabilities, PR , in Equation (21) were computed using the CDFTVN function of the GAUSS (1992) software. We consider only the case when valid signatures are duplicated at most once, D = F2 • For allocating U = NU invalid signatures and D = N D duplicates of valid signatures between the two submissions we use the expected number under random allocation of signatures to the two submissions. This gives, h = For S, Uh = NhU, 52 D N (Nh-1)lJ.(N _l) D; DFS . = = N= Dh wIth Dh = Dh = DF + Ds . = = N= DFS With DFS = 2NF NS N(N-l)D where + D FS . It can be shown that the estimator £1 is unbiased for M. Expressions for SD( U1) and Cov( UI,MF) were introduced in Equations (12) and (17) without the subscript F. The other sampling moments are -" -" Cov(Ul,DFS ) = - NF(NF-nF)­ nF(NF-l) UFDFS , SD(£1) = ([SD(£1F)]2 + [SD(£1s)]2 + Var(.oFs) 1 - 2Cov(£1F,.oFS) - 2Cov(£1 s,.oFs )) 2 where SD(£1h) = JVar((jh) + Var(.oh) + 2Cov(Uh, .oh) Var(U ) = N2(Nh-nh) Uh(I-Uh) h -" h -" N h -l nh 2 N (Nh - n h ) - - Cov(Uh, Dh) = - 2 rt'h(Nh-2) U hDh Cov(£1h,.oFs) = Nh( Nhn~nh) (N~!:1 + ~~!:2)DFS for h = F, S, and These expressions are used in Equations (10) and (22) for the correlations P12, P13, and P23, and Equations (11) and (23) for the standardized lower and upper integration limits a, b, and c. Derivations of the Cov(U 1,.oFs ), Var(.oFs) and Cov(£1 h,.oFs ), for h = F,S, can be found in Section 3.8.2 in the Appendix. Table 3.3 displays the probability of correct decision, PeD, for petition sizes N = NF + Ns = 110,000; 120,000; and 130,000; second submission sizes Ns = 1,000; 5,000; 10,000; and 20,000; number of distinct valid signatures M = f R, where 53 Table 3.3 Probabilities of acceptance from the first sample (PI A)' acceptance from the combined first and second samples (P2A ), and correct decision (PeD ) for two submissions (h = F,S) with single duplication (D = F2)' a. N = jb llO,oooa P2A P 1A 15 [j Ns 15F 15s 15Fs nl = 1,000 ~= 5% PeD 10% 5% 10% 0.99 0.99 0.99 0.99 1% 1 1 1 18.9% 1000 18.9 5000 18.9 10000 18.9 20000 0.982% 0.911 0.826 0.669 0.000% 0.002 0.008 0.033 0.018% 0.087 0.165 0.298 0.000 0.000 0.000 0.000 0.025 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.826 0.849 0.850 0.850 0.961 0.962 0.962 0.962 0.99 0.99 0.99 0.99 2 2 2 2 17.9 17.9 17.9 17.9 1000 5000 10000 20000 1.964 1. 823 1.653 1. 339 0.000 0.005 0.016 0.066 0.036 0.173 0.331 0.595 0.000 0.000 0.000 0.000 0.058 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.744 0.797 0.797 0.797 0.932 0.935 0.935 0.936 0.99 0.99 0.99 0.99 4 4 4 4 15.9 15.9 15.9 15.9 1000 5000 10000 20000 3.927 3.645 3.305 2.677 0.000 0.008 0.033 0.132 0.073 0.347 0.662 1.191 0.000 0.000 0.000 0.000 0.118 0.000 0.000 0.000 0.012 0.000 0.000 0.000 0.642 0.737 0.738 0.738 0.877 0.888 0.889 0.890 0.99 0.99 0.99 0.99 8 8 8 8 11. 11. 11. 11. 9 9 9 9 1000 5000 10000 20000 7.855 7.289 6.612 5.355 0.001 0.016 0.066 0.265 0.144 0.695 1.322 2.380 0.001 0.000 0.000 0.000 0.197 0.006 0.000 0.000 0.046 0.000 0.000 0.000 0.539 0.676 0.681 0.681 0.786 0.825 0.825 0.826 1. 00 1. 00 1. 00 1. 00 1 1 1 1 18.0 18.0 18.0 18.0 1000 5000 10000 20000 0.982 0.911 0.826 0.669 0.000 0.002 0.008 0.033 0.018 0.087 0.165 0.298 0.000 0.000 0.000 0.000 0.173 0.000 0.000 0.000 0.053 0.000 0.000 0.000 0.614 0.500 0.500 0.500 0.536 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 2 2 2 2 17.0 17 .0 17.0 17.0 1000 5000 10000 20000 1.964 1.823 1.653 1. 339 0.000 0.005 0.016 0.066 0.036 0.173 0.331 0.595 0.000 0.000 0.000 0.000 0.228 0.000 0.000 0.000 0.087 0.000 0.000 0.000 0.638 0.500 0.500 0.500 0.555 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 4 4 4 4 15.0 15.0 15.0 15.0 1000 5000 10000 20000 3.927 3.645 3.305 2.677 0.000 0.008 0.033 0.132 0.073 0.347 0.662 1.191 0.000 0.000 0.000 0.000 0.290 0.002 0.000 0.000 0.144 0.000 0.000 0.000 0.664 0.501 0.500 0.500 0.583 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 8 8 8 8 11. 0 11. 0 11.0 11.0 1000 5000 10000 20000 7.855 7.289 6.612 5.355 0.001 0.016 0.066 0.265 0.144 0.695 1.322 2.380 0.012 0.000 0.000 0.000 0.358 0.022 0.000 0.000 0.230 0.000 0.000 0.000 0.686 0.512 0.500 0.500 0.619 0.500 0.500 0.500 1. 01 1. 01 1. 01 1. 01 1 1 1 1 17.2 17 .2 17 .2 17.2 1000 5000 10000 20000 0.982 0.911 0.826 0.669 0.000 0.002 0.008 0.033 0.018 0.087 0.165 0.298 0.000 0.000 0.000 0.000 0.539 0.000 0.000 0.000 0.566 0.000 0.000 0.000 0.956 0.853 0.853 0.853 0.991 0.965 0.965 0.965 1. 01 1. 01 1. 01 1. 01 2 2 2 2 16.2 16.2 16.2 16.2 1000 5000 10000 20000 1.964 1.823 1.653 1. 339 0.000 0.005 0.016 0.066 0.036 0.173 0.331 0.595 0.000 0.000 0.000 0.000 0.535 0.001 0.000 0.000 0.563 0.000 0.000 0.000 0.928 0.799 0.799 0.799 0.981 0.937 0.938 0.938 1. 01 1. 01 1. 01 1. 01 4 4 4 4 14.2 14.2 14.2 14.2 1000 5000 10000 20000 3.927 3.645 3.305 2.677 0.000 0.008 0.033 0.132 0.073 0.347 0.662 1.191 0.000 0.000 0.000 0.000 0.532 0.012 0.000 0.000 0.562 0.000 0.000 0.000 0.895 0.743 0.738 0.739 0.961 0.890 0.891 0.891 1. 01 1. 01 1. 01 1. 01 8 8 8 8 10.2 10.2 10.2 10.2 1000 5000 10000 20000 7.855 7.289 6.612 5.355 0.001 0.016 0.066 0.265 0.144 0.695 1.322 2.380 0.071 0.000 0.000 0.000 0.569 0.060 0.000 0.000 0.598 0.001 0.000 0.000 0.862 0.704 0.681 0.681 0.931 0.826 0.826 0.827 aN = bM = NF + Ns f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f ~ 1. 54 Table 3.3 (continued) b. N = 120,oooa jb 15 U Ns DF 155 PIA DFs n, = 1,000 * P2A PeD = 5% 10% 5% 10% 0.99 0.99 0.99 0.99 1% 1 1 1 25.5% 1000 25.5 5000 25.5 10000 25.5 20000 0.983% 0.918 0.840 0.694 0.000% 0.002 0.007 0.028 0.017% 0.080 0.153 0.278 0.000 0.000 0.000 0.000 0.042 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.788 0.828 0.828 0.828 0.942 0.945 0.945 0.945 0.99 0.99 0.99 0.99 2 2 2 2 24.5 24.5 24.5 24.5 1000 5000 10000 20000 1.967 1.837 1.681 1. 389 0.000 0.003 0.014 0.056 0.033 0.160 0.305 0.555 0.000 0.000 0.000 0.000 0.081 0.000 0.000 0.000 0.006 0.000 0.000 0.000 0.709 0.780 0.780 0.781 0.911 0.917 0.917 0.918 0.99 0.99 0.99 0.99 4 4 4 4 22.5 22.5 22.5 22.5 1000 5000 10000 20000 3.933 3.673 3.361 2.778 0.000 0.007 0.028 0.111 0.067 0.320 0.612 1.112 0.000 0.000 0.000 0.000 0.142 0.001 0.000 0.000 0.021 0.000 0.000 0.000 0.612 0.724 0.726 0.726 0.852 0.872 0.872 0.873 0.99 0.99 0.99 0.99 8 8 8 8 18.5 18.5 18.5 18.5 1000 5000 10000 20000 7.867 7.348 6.722 5.556 0.001 0.014 0.056 0.223 0.132 0.638 1.222 2.222 0.004 0.000 0.000 0.000 0.221 0.015 0.000 0.000 0.066 0.000 0.000 0.000 0.516 0.661 0.673 0.673 0.758 0.811 0.812 0.813 1. 00 1. 00 1. 00 1. 00 1 1 1 1 24.8 24.8 24.8 24.8 1000 5000 10000 20000 0.983 0.918 0.840 0.694 0.000 0.002 0.007 0.028 0.017 0.080 0.153 0.278 0.000 0.000 0.000 0.000 0.216 0.000 0.000 0.000 0.093 0.000 0.000 0.000 0.640 0.500 0.500 0.500 0.561 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 2 2 2 2 23.8 23.8 23.8 23.8 1000 5000 10000 20000 1.967 1. 837 1. 681 1. 389 0.000 0.003 0.014 0.056 0.033 0.160 0.305 0.555 0.000 0.000 0.000 0.000 0.264 0.001 0.000 0.000 0.129 0.000 0.000 0.000 0.660 0.500 0.500 0.500 0.579 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 4 4 4 4 21.8 21.8 21.8 21.8 1000 5000 10000 20000 3.933 3.673 3.361 2.778 0.000 0.007 0.028 0.111 0.067 0.320 0.612 1.112 0.000 0.000 0.000 0.000 0.317 0.007 0.000 0.000 0.184 0.000 0.000 0.000 0.680 0.505 0.500 0.500 0.606 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 8 8 8 8 17 .8 17.8 17.8 17.8 1000 5000 10000 20000 7.867 7.348 6.722 5.556 0.001 0.014 0.056 0.223 0.132 0.638 1.222 2.222 0.019 0.000 0.000 0.000 0.383 0.041 0.000 0.000 0.270 0.000 0.000 0.000 0.698 0.524 0.500 0.500 0.638 0.500 0.500 0.500 1.01 1. 01 1. 01 1. 01 1 1 1 1 24.1 24.1 24.1 24.1 1000 5000 10000 20000 0.983 0.918 0.840 0.694 0.000 0.002 0.007 0.028 0.017 0.080 0.153 0.278 0.000 0.000 0.000 0.000 0.565 0.001 0.000 0.000 0.608 0.000 0.000 0.000 0.952 0.830 0.830 0.831 0.987 0.947 0.947 0.947 1. 01 1. 01 1. 01 1. 01 2 2 2 2 23.1 23.1 23.1 23.1 1000 5000 10000 20000 1.967 1.837 1. 681 1. 389 0.000 0.003 0.014 0.056 0.033 0.160 0.305 0.555 0.000 0.000 0.000 0.000 0.556 0.006 0.000 0.000 0.600 0.000 0.000 0.000 0.927 0.784 0.782 0.782 0.977 0.919 0.919 0.920 1. 01 1. 01 1. 01 1. 01 4 4 4 4 21.1 21.1 21. 1 21.1 1000 5000 10000 20000 3.933 3.673 3.361 2.778 0.000 0.007 0.028 0.111 0.067 0.320 0.612 1.112 0.000 0.000 0.000 0.000 0.549 0.031 0.000 0.000 0.592 0.000 0.000 0.000 0.896 0.738 0.726 0.727 0.959 0.873 0.874 0.874 1. 01 1. 01 1. 01 1. 01 8 8 8 8 17 .1 17 .1 17 .1 17.1 1000 5000 10000 20000 7.867 7.348 6.722 5.556 0.001 0.014 0.056 0.223 0.132 0.638 1.222 2.222 0.070 0.000 0.000 0.000 0.581 0.098 0.001 0.000 0.621 0.006 0.000 0.000 0.864 0.711 0.673 0.673 0.931 0.813 0.812 0.813 aN = NF+Ns = f R (R = 89,048) and correct decision are: reject for f bM < 1 and accept for f ;::: 1. 55 Table 3.3 (continued) c. N = 130,oooa P2A PiA fb 15 fJ Ns 15F 15s 15Fs n, = 1,000 If; = 5% PeD 10% 5% 10% 0.99 0.99 0.99 0.99 1% 1 1 1 31.2% 1000 31.2 5000 31.2 10000 31.2 20000 0.985% 0.925 0.852 0.716 0.000% 0.002 0.006 0.024 0.015% 0.074 0.142 0.260 0.000 0.000 0.000 0.000 0.060 0.000 0.000 0.000 0.005 0.000 0.000 0.000 0.755 0.812 0.812 0.812 0.925 0.930 0.930 0.931 0.99 0.99 0.99 0.99 2 2 2 2 30.2 30.2 30.2 30.2 1000 5000 10000 20000 1.969 1.849 1.704 1. 432 0.000 0.003 0.012 0.048 0.031 0.148 0.285 0.520 0.000 0.000 0.000 0.000 0.101 0.000 0.000 0.000 0.012 0.000 0.000 0.000 0.679 0.767 0.767 0.768 0.891 0.902 0.903 0.904 0.99 0.99 0.99 0.99 4 4 4 4 28.2 28.2 28.2 28.2 1000 5000 10000 20000 3.938 3.698 3.408 2.864 0.000 0.006 0.024 0.095 0.062 0.295 0.568 1.042 0.000 0.000 0.000 0.000 0.163 0.003 0.000 0.000 0.032 0.000 0.000 0.000 0.587 0.713 0.716 0.716 0.828 0.858 0.859 0.860 0.99 0.99 0.99 0.99 8 8 8 8 24.2 24.2 24.2 24.2 1000 5000 10000 20000 7.878 7.396 6.817 5.728 0.001 0.012 0.048 0.189 0.122 0.592 1.135 2.083 0.007 0.000 0.000 0.000 0.242 0.026 0.000 0.000 0.085 0.000 0.000 0.000 0.497 0.646 0.666 0.666 0.733 0.800 0.801 0.802 1. 00 1. 00 1. 00 1. 00 1 1 1 1 30.5 30.5 30.5 30.5 1000 5000 10000 20000 0.985 0.925 0.852 0.716 0.000 0.002 0.006 0.024 0.015 0.074 0.142 0.260 0.000 0.000 0.000 0.000 0.250 0.000 0.000 0.000 0.130 0.000 0.000 0.000 0.659 0.500 0.500 0.500 0.583 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 2 2 2 2 29.5 29.5 29.5 29.5 1000 5000 10000 20000 1.969 1.849 1.704 1.432 0.000 0.003 0.012 0.048 0.031 0.148 0.285 0.520 0.000 0.000 0.000 0.000 0.292 0.002 0.000 0.000 0.165 0.000 0.000 0.000 0.675 0.502 0.500 0.500 0.600 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 4 4 4 4 27.5 27.5 27.5 27.5 1000 5000 10000 20000 3.938 3.698 3.408 2.864 0.000 0.006 0.024 0.095 0.062 0.295 0.568 1.042 0.000 0.000 0.000 0.000 0.339 0.016 0.000 0.000 0.217 0.000 0.000 0.000 0.692 0.510 0.500 0.500 0.625 0.500 0.500 0.500 1. 00 1. 00 1. 00 1. 00 8 8 8 8 23.5 23.5 23.5 23.5 1000 5000 10000 20000 7.878 7.396 6.817 5.728 0.001 0.012 0.048 0.189 0.122 0.592 1.135 2.083 0.024 0.000 0.000 0.000 0.402 0.065 0.001 0.000 0.301 0.002 0.000 0.000 0.707 0.537 0.500 0.500 0.653 0.501 0.500 0.500 1. 01 1. 01 1. 01 1. 01 1 1 1 1 29.8 29.8 29.8 29.8 1000 5000 10000 20000 0.985 0.925 0.852 0.716 0.000 0.002 0.006 0.024 0.015 0.074 0.142 0.260 0.000 0.000 0.000 0.000 0.583 0.005 0.000 0.000 0.636 0.000 0.000 0.000 0.949 0.815 0.813 0.814 0.985 0.932 0.932 0.932 1. 01 1. 01 1. 01 1. 01 2 2 2 2 28.8 28.8 28.8 28.8 1000 5000 10000 20000 1.969 1.849 1.704 1. 432 0.000 0.003 0.012 0.048 0.031 0.148 0.285 0.520 0.000 0.000 0.000 0.000 0.571 0.018 0.000 0.000 0.626 0.000 0.000 0.000 0.926 0.775 0.768 0.769 0.975 0.904 0.904 0.905 1. 01 1. 01 1. 01 1. 01 4 4 4 4 26.8 26.8 26.8 26.8 1000 5000 10000 20000 3.938 3.698 3.408 2.864 0.000 0.006 0.024 0.095 0.062 0.295 0.568 1.042 0.000 0.000 0.000 0.000 0.561 0.057 0.000 0.000 0.614 0.002 0.000 0.000 0.895 0.739 0.716 0.717 0.957 0.860 0.860 0.861 1. 01 1. 01 1. 01 1. 01 8 8 8 8 22.8 22.8 22.8 22.8 1000 5000 10000 20000 7.878 7.396 6.817 5.728 0.001 0.012 0.048 0.189 0.122 0.592 1.135 2.083 0.070 0.001 0.000 0.000 0.590 0.139 0.003 0.000 0.636 0.017 0.000 0.000 0.864 0.720 0.667 0.666 0.930 0.805 0.801 0.802 aN = NF bM + Ns = f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f :::: 1. 56 J= 0.99, 1.00, and 1.01; duplication rates D = 0.01,0.02,0.04, and 0.08; and sampling fractions M. = 0.05, and 0.10. Also included are the values ofU, D F , Ds, and D FS corresponding to fixed values for N, In Table 3.3, for fixed f and D, the probability of correct decision (PeD) is approximately constant for Ns Ns = J, D, and N s. ~ 5,000 and is more favorable to the petitioner for 1,000. For the cases with J = 1 (M except for the cases where Ns = 1,000; D = R), the PeD is approximately equal to 0.50, = 8%, Ns = 5,000 and M. = 5%. In these cases the decision rule is biased in favor of the petitioner, PeD > 0.50, and tends to increase in D. This increase appears to be due to increasing P2A. That is, the probability of accepting the petition from the combined first and second samples, P2A , tend to increases with D. 3.6 Summary and Conclusions We have investigated the probability of a correct decision for the Oregon decision rule in a variety of situations. The probability of a correct decision was approximated by a multivariate normal distribution for the joint sampling distribution of estimators from the multiple stages. Cases for both single and multiple duplication of valid signatures were considered for single submissions. Only single duplication of valid signatures was considered for two submissions. For a single submission with single duplication of valid signatures, two stages of sampling, with the possibility of acceptance at each stage, will tend to bias the decision rule in favor of the petitioner. This effect is very small except when the petition duplication rate (D) is equal to or larger than the assumed upper bound for the duplication rate (DiU) for the first sampling stage. Conversely, the negative bias of the 57 estimator M under multiple duplication will tend to bias the decision rule against the petitioner. For two submissions with a fixed total number of single duplicates of valid signatures in a fixed sized petition (NF + Ns petitioner (smaller PeD for M = N), smaller second submission size favors the < R and larger PeD for M ~ R). This effect was small for second submission sizes of 5,000 or larger since PeD was approximately constant. 3.7 References GAUSS (1992), The Gauss Version 3.0, Aptech Systems, Inc. Maple Valley, Washington Houser, J. (1985), Validating Initiative and Referendum Petition Signatures, Research Monograph, Legislative Research, S420 State Capitol, Salem, Oregon. Oregon Secretary of State, Elections Division: 1999 Oregon Administrative Rules, Chapter 165. Smith-Cayama, R. A. and Thomas, D. R. (1999), Estimating the Number of Distinct Valid Signatures in Initiative Petitions, Unpublished manuscript. Department of Statistics, Oregon State University, Corvallis, Oregon. 58 3.8 Appendix 59 3.8.1 Calculation of Cov( iT \,M) Let Djo: denote the number of valid signatures in the combined first and second n samples, for 0; = = n! + n2, from the O;th elector who signed j valid signatures in the petition, 1,2, ... , F j , and j = 1, ... , N. Note that Djo: has the hypergeometric (N, n, j, i) with Pij = P( Djo: = i) given by for i = 0, 1, ... , j. Define for 8 = 1, ... ,U as = { I 2 ° ifthe 8th invalid signature is selected in the first sample of size n! if the 8th invalid signature is selected in the second sample of size n2 otherwise Note that P( as = 1) = ~ and P( as = 2) = ~. The conditional distribution of DjO:' given as = k (fork = 1,2), ishypergeometric (N - l,n -1,j,i) with Pij .!! = P(Djo: = ila s = k) given by (j) (N-l-j) = (N-l) ~ I1,j.ll n-l-~ for i = 0, 1, ... , j. n-l For the number of electors in the sample with i valid signatures, Ii, write Fj n Ii = LIij j=i with Iij = LI(DjO: = i), 0:=1 where I (.) is the indicator function and Iij denotes the number of electors with i signatures in the combined sample and j signatures in the petition (i ::; j). Note that Iij is not observable, but Ii is. For the proportions of invalid signatures in the first and the second samples, Ul and U2, write 60 Then, n E(fi) = ~PijFj j=i and E(1h) = U _) Var (Uk = and nk) U (1 - U) nk N-l (N - for k = 1,2. The next results are needed to calculate the Cov( f) J ,!Vi). U(l-U) N-l U(J-U) N-J . n Lemma2 COV(Ul,fi) = U2:(~j.lJ - Pij)Fj j=i _ Proof· COV(Ul,fi) = J U n Fj ._n n 2:2:2: P(a s = 1)P(6ja = Lias = 1) - U2: Pij Fj 1 s=lj=i a=l j=i n n J=l J=t -- U"P'IIY ~. tJ. J - U"pp· ~. tJ J n = U2:(Pij .11 - ~j)Fj. J=l :1 From the form of f) = N ( UI -" D= N(N -I) n n(n-I) calculated as 2:(i - l=2 + ~ U2) and the general form of -" -" -"-" l)fi, the covariance between U I = NUl and M = N - U - Dis 61 Cov(U1,M) = - ~2 (n1var(ul) + n 2Cov(UI,U2)) _ N2(N-n) U(I-U) _ N2(N-I) n N-l n(n-l) - N:/:~;)~(i U~(i - 1)~(R' ~ 7=2 ~. J=7 7J.11 - I)COV(Ul,fi) _ R)Y. 7J J 3.8.2 Calculation of Cov(U .,D FS ), Var(D Fs ), and Cov(M h,D FS ) for h = F,S Consider two independent random samples of nF and ns signatures drawn without replacement from the first and second submissions of sizes NF and N s, respectively. Further, "first sample" of size nl from the first submission is a random sub-sample from the combined sample of size nF = nl + n2. Only single duplication of valid signatures is considered. Define if the kth invalid signature is selected in the first sample of size nl for k = 1,2, ... ,UF otherwise 1 bz = if the both signatures from the lth elector who signed in each submission are selected in the samples of size nF and ns for l = 1,2, ... ,D FS o otherwise if the pth invalid signature is selected in the sample(s) from the hth submission otherwise for p = 1,2, '" ,Uh, h = F,S 62 and let ghq denote the number of signatures in the sample taken in the hth submission from the qth elector with two signatures in the hth submission, for q = 1,2, ... ,Dh, and h = F,S. Note that, P(ak = 1) = ;~ fork = 1,2, ... ,UF P(b 1 = 1) = ;:~s for! = 1,2, ... ,DFS (nF- i )ns P(bl=ll ak=I)= (NF-i )Ns forl=I,2, ... ,D Fs andk=I,2, ""UF (nF-l) (ns-l) P(bl = Ilbm = 1) = (NF-l)(Ns-l) for m, l = 1,2, ... , D FS with m -I- l and for h = F, S P(Chp = 1) = r;;" for p = 1,2, ... , Uh for q = 1,2, ... , Dh for m = 1,2, ... , D FS , P = 1,2, ... , Uh for m = 1,2, ... , D FS , q = 1,2, ... , Dh. For the proportion of invalid signatures in the first sample, 'ill, and for the number of duplicates of valid signatures between the samples from the first and second submissions, D FS and Then, dFS = 2::)1. 1=1 63 Thus, the covariance between f) I and Dps is Note Var(dps ) DFS DFs DFS 1=1 [=1 m=l = LP(bl = 1) + L L P(b1 = 1)P(bm = _ - nFnS NFNS 11 bl = 1) - (;:~s Dps) mopl D PS + nFnS (nF-1)(ns-1) NFNS (NF-1)(Ns-1) D PS (D PS _ 1) _ (nFn s NFNS D PS ) 2 Since variance of Dps is given by Var( Dps ) = (NFNS) 2 Var(dps ) nFnS For the proportion of invalid signatures in the sample from the hth submission, fh, and the number of duplicates of signatures in the sample from the hth submission, h = F,S, write and 2 64 Dh dh = LI(ghq = 2) q=l where I (.) is the indicator function. The covariance between Mhand DFS is calculated as The covariance between 7fh and dFS is given by and the covariance between d h and dFS , Dh D FS COV(dh,d FS ) = J;lflP(9h q = 2)P(b m = llghq nFnsnh(nh- 1) D D NFNsNh(Nh-l) FS h - (nh- 1) (nh-2) nFnS D D _ (Nh-l) (Nh-2) NFNS h FS = _ 2 (nh-1)(Nh-nh) nFnS D D N h(Nh-I)(Nh-2) NFNs h FS· Therefore, for h = F,S = 2) - (~f~=~~Dh) (;:~sDFS) 65 Chapter 4 Summary Initiative petitions that are filed with Election Offices to propose additions or modifications to statutes or the state constitution can contain duplicates of valid signatures. Statistical sampling is used by several states either to reduce verification costs or by necessity because of the constraint on the time permitted for verification. It is then of interest to estimate the number of distinct valid signatures in an initiative petition. In Chapter 2, we addressed the problem of estimating the number of distinct valid signatures based on the verification of a random sample of signatures. Several linear estimators and a non-linear estimator for the number of distinct valid signatures were compared with regard to their bias and RMSE for four completely verified Washington State petitions. Comparison results showed linear estimators based on the expansion factor for single duplication performed much better than the unbiased estimator and the non-linear estimator based on the jackknife technique for all situations considered. In sampling fractions of 10% or less, the probability of observing multiple duplicates of valid signatures is very small. As a result only a small improvement in RMSE was observed over the Goodman-type estimator M2 among all the other estimators considered. To reduce the bias ofthe linear estimators for multiple duplication, we considered a bias adjustment factor that requires prior information about duplicate replication such as data from similar completely verified petitions. The improvement in RMSE ofthe bias-adjusted estimators for the four Washington state petitions was negligible when compared to their non-adjusted counterparts. 66 In Chapter 3, we addressed the problem of evaluating the decision rule adopted by Oregon for their state initiative and referendum petitions by determining an approximation for the probability of correct decision (PeD ). The estimator M used in the Oregon decision rule is the estimator denoted as adopted because the duplication component, Md in Chapter 2. This estimator was fJ, simply depends on the number of duplicates of valid signatures in the sample (d), and performed similarly to the Goodman-type estimator M2. For single submission of signatures we calculated the probability of correct decision for both single and multiple duplication of valid signatures for several sampling fractions and three petition sizes. The PeD was approximated by a multivariate normal distribution model for the joint sampling distribution ofthe estimators involved in the multiple stages. For two submissions only single duplication of valid signatures was evaluated for two submissions. In the case of a single submission with single duplication of valid signatures, the use of two sampling stages in the decision rule only favors the petitioner when the true duplication rate achieves or exceeds the assumed upper limit for duplication rate used in the first sampling stage. For multiple duplication the decision rule becomes unfavorable to the petitioner. In the case of two submissions with single submission, small second submission size tends to favor the petitioner and for large sizes (Ns 2': 5,000) the PeD is approximately constant. The results of this dissertation were helpful to the Oregon Elections Division in their decision to change the certification rule to include the estimation of the number of duplicates of valid signatures as described in Sections 3.4.1 and 3.5.1. Previously, a duplication rate of 2% was assumed for all petitions. 67 Bibliography Bunge, J. and Fitzpatrick, M. (1993), Estimating the Number of Species: A Review, Journal ofthe American Statistical Association, 88, 364-373. Haas, P. J. and Stokes, L. (1998), Estimating the Number of Classes in a Finite Population, Journal ofthe American Statistical Association, 93,1475-1487. Houser, J. (1985), Validating Initiative and Referendum Petition Signatures, Research Monograph, Legislative Research, S420 State Capitol, Salem, Oregon. GAUSS (1992), The Gauss Version 3.0, Aptech Systems, Inc. Maple Valley, Washington Goodman, L. A. (1949), On the Estimation of the Number of Classes in a Population, Annals ofMathematical Statistics, 20, 572-579. Oregon Secretary of State, Elections Division: 1999 Oregon Administrative Rules, Chapter 165.