Document 11404813

advertisement
AN ABSTRACT OF THE DISSERTATION OF Ruben A. Smith-Cayama for the degree of Doctor of Philosophy in Statistics presented
on September 7, 1999. Title: Statistical Estimation for Initiative Petitions and
Performance ofthe Decision Rule for Oregon State Petitions.
Redacted for Privacy
Abstract approved:
____
David R. Thomas
Two topics concerning statistical sampling of initiative petitions are considered in this
dissertation. The first concerns statistical estimation of the number of distinct valid
signatures in a petition, and the second evaluates the statistical decision rule used by
Oregon for determining certification of state initiative and referendum petitions.
In several states that permit initiative petitions to modify or add legislation, statistical
sampling of signatures is used to obtain an estimate for the number of distinct valid
signatures in the petitions. This estimate depends on the number of signatures submitted,
the number of invalid signatures and, in some states, the number of duplicates of valid
signatures.
We consider several linear estimators and a non-linear estimator for the number of
distinct valid signatures. Their performances are compared with respect to bias and root
mean squared error using several sample sizes for four fully-verified petitions from
Washington State. Exact expressions for the bias and root mean square error are used for
the linear estimators and estimates from simulated random sampling are used for the non­
linear estimators. For the small sampling fractions typically used in state initiative
petitions (3-10%), none of the estimators are found to perform much better than the
estimator that is constructed to be unbiased when valid signatures are assumed to be
duplicated at most once.
Oregon allows a petition to be filed in either one or two submissions. The Oregon
decision rule for certification of petitions is complicated in that multiple stages of
sampling are used: two stages for a single submission and three stages for two
submissions. A petition can be accepted (certified) after any sampling stage but only
rejected after verifying all samples from each submission. The decision rule is based on
estimates for the number of distinct valid signatures obtained at each sampling stage. We
evaluate the performance of the Oregon decision rule by calculating an approximation for
the probability of making a correct decision for the certification of several hypothetical
petitions. The petitions are chosen to represent different sizes and quality with respect to
invalid and duplicated valid signatures.
©
Copyright by Ruben A. Smith-Cayama September 7, 1999 All Rights Reserved Statistical Estimation for Initiative Petitions and
Performance of the Decision Rule for Oregon State Petitions
by Ruben A. Smith-Cayama A DISSERTATION submitted to Oregon State University in partial fulfillment of
the requirements for the
degree of
Doctor of Philosophy
Presented September 7, 1999 Commencement June 2000 Doctor of Philosophy dissertation of Ruben A. Smith-Cayama presented on September 7,
1999
APPROVED:
Redacted for Privacy
Major Professor, representing Statistics
Redacted for Privacy
Chair of Department 0
tatIstIcs
Redacted for Privacy
I understand that my dissertation will become part of the permanent collection of Oregon
State University libraries. My signature below authorizes release of my dissertation to
any reader upon request.
Redacted for Privacy
ACKNOWLEDGMENTS I would like to express my deep appreciation and sincere thanks to my dissertation
advisor, Dr. David R. Thomas, for his patience, support, and guidance during the course
of this dissertation research. I would also like to thank Dr. Virginia Lesser, Dr. David
Birkes, Dr. David Butler, and Dr. Philippe Rossignol for their recommendations and
comments for improving this dissertation.
I am grateful to the University of Los Andes, Venezuela, to the Department of
Statistics of Oregon State University, and to the Office of Budgets and Planning,
especially Duane Faulhaber, Interim Director, for providing funding during my doctoral
studies.
I wish to thank Colleen Sealock, Director, and Dr. Scott Tighe, Operations Manager,
of the Oregon State Elections Division, for providing information about the Oregon
Certification rule. The author is also grateful to the Washington Secretary of State for
providing the data from fully-verified petitions, especially to Donald F. Whiting,
Assistant Secretary of State, and Pamela Floyd, Initiative Manager of the Elections
Division.
I wish to express my sincere gratitude to my mother, Olga, for her love and support
during the pursuit of this degree, and to my brothers and sisters for their emotional
support. I would like to thank my friend Breda Mufioz for her companionship and support
all these years.
CONTRIBUTION OF AUTHORS
Dr. David Thomas proposed the topic that originated this dissertation. Dr. Thomas
was also involved in the solution, interpretation of results and editing of each chapter of
this dissertation.
TABLE OF CONTENTS
1. INTRODUCTION .................................................................................................. . 2. ESTIMATING THE NUMBER OF DISTINCT VALID SIGNATURES
IN INITIATIVE PETITIONS ............................................................................
4
2.1
Abstract...... ........................................... ........................... ..... ......... ..... ........... ...... 5 2.2
Introduction ........................................................................ ,...............................
2.3
Terminology and Notation ................................................................................... 7 2.4
Theoretical Background........................................... ........ ..... ......... ........... .......... 8 2.4.1
2.4.2
2.4.3
5
Estimators for D ..................................................................................... 8 Estimators for M .................................................................................. 11 Expectation and Variance of M............................................................ 12 2.5
Performance of the Estimators .......................................................................... 13 2.6
Summary ...........................................................................................................
19 2.7
References........................................ .......................... ......................................
21 2.8
Appendix ........................................................................................................... 22 3. SAMPLING FOR CERTIFICATION OF OREGON STATE
INITIATIVE PETITIONS .................................................................................. 28 3.1
Abstract ............................................................................................................
29 3.2
Introduction ....................................................................................................... 29 3.3
The Statistical Decision Problem ...................................................................... 32 TABLE OF CONTENTS (Continued)
3.4
One Submission of Signatures .......................................................................... 33 3.4.1 The Decision Rule ................................................................................ 33 3.4.2 Probability of Correct Decision ............................................................ 35 3.4.3 Numerical Results ................................................................................. 38 3.4.3.1
3.4.3.2
3.5
Single Duplication of Valid Signatures ................................. 38 Multiple Duplication of Valid Signatures .............................. 43 Two Submissions of Signatures ........................................................................ 49 3.5.1 The Decision Rule.. .............................................................................. 49 3.5.2 Probability of Correct Decision ............................................................ 50 3.5.3 Numerical Results ................................................................................. 51 3.6
Summary and Conclusions................................................................................ 56 3.7
References ......................................................................................................... 57 3.8
Appendix ........................................................................................................... 58 3.8.1 Calculation ofCov(fJ I,M).................................................................... 59 3.8.2 Calculation ofCov(U I ,fjFS) , Var(fjFS), and Cov(Mh,fjFS) for h = F,S ............................................................ 61 4. SUMMARY .............................................................................................................. 65 BIBLIOGRAPHY........................................................................................................ 67 LIST OF TABLES
Tables
2.1 Description of the petitions A, B, C, and D ..........................................................
14 2.2 Expected frequency for replications of valid signatures, E(!i) .......................................................................................
15 2.3 True values of k and T3, . .. ,
Tk,
and k and f3, ... , fk ..........................................................................
16 2.4 Specified values of the bias adjusted factor, Bf,k,T'
for each petition, adjusted estimator and sampling fraction (q): 3%, 5%, 10%, and 20% ....................................................................
17 M..................................................... ..........................
18 M..................................................... ........................
20 3.1 Probabilities of acceptance from the first sample (PIA)
and correct decision (PeD) for single duplication (D = F 2) ................................
40 specified values
2.5 BIAS (%) of estimators for
2.6 RMSE (%) of estimators for
3.2 Probabilities of acceptance from the first sample (PIA)
and of correct decision (PeD ) with the correlation coefficient for f) I and M, bias and standard deviation of the estimator M,
and the bivariate normal integration limits a and b in Equation (9)
for single duplication (() = 0+) and multiple (() > 0) duplication ........................
45 3.3 Probabilities of acceptance from the first sample (HA), acceptance from the combined first and second samples (P2A ), and correct decision (PeD ) for two submissions (h = F, S) with single duplication (D = F 2 ) .........•..•.•••.••••.•••.•••.•........•••••••••.••••••.....••.•••••••••
53 To my mother, Olga,
to my sisters and brothers,
and to the memory of my father, Marcelino.
Statistical Estimation for Initiative Petitions and
Performance of the Decision Rule for Oregon State Petitions
Chapter 1
Introduction
Twenty-five states give citizens the power to use initiative petitions to propose
legislation for consideration either in a ballot measure or in the legislature (Houser,
1985). The sponsor ofthe petition must circulate the complete text of the proposal and
obtain a minimum number of signatures from registered voters. Signatures collected are
then filed as a petition with the state office in charge, usually the Secretary of State. This
office determines by some procedure, specified by administrative rule or law, if the
petition is certified or not. To be certified a petition should contain a specified minimum
number of distinct valid signatures established by law. To be valid, a signature must be
that of a registered voter. Typical petitions contain both invalid signatures and duplicates
of valid signatures. If a registered voter signs the petition more than once, all but one of
such signatures are duplicates.
The process for verifying signatures and the decision rule for certification vary among
the states. Twenty-two states, including Oregon, have a formal procedure for verifying if
a signature on the petition is that of a registered voter (Houser, 1985). In general the
validation of signatures is accomplished by comparing the name, address, and signature
on the petition with appropriate voter registration records. Results of the certification
process must be given within a specified time limit, which varies greatly among the
states. Oregon has only fifteen days from the due date for submission to verify all state
initiative petitions. Washington has approximately ninety days for verification of
initiative petitions for the ballot. Fourteen states verify all signatures submitted. Five
2
states, including California and Washington, use sampling with a procedure for deciding
if all signatures are to be verified. Only Michigan, North Dakota, and Oregon base the
certification of state petitions entirely on sampling. In Michigan and North Dakota the
decision process is based on an estimate for the number of valid signatures (ignoring
duplication). In Oregon the current decision rule is based on estimates for the number of
distinct valid signatures in the petition. Only recently (1999 Oregon Administrative
Rules, Chapter 165) changed their decision procedure to include estimation of the number
of duplicates of valid signatures. Previously a 2% duplication rate was assumed for all
petitions.
Oregon allows a petition to be filed in two submissions. The Oregon decision rule for
certification of petitions is complicated in that multiple stages of sampling are used: two
stages for a single submission and three stages for two submissions. After the first
submission, the signatures in the first sample are verified and the estimate of the number
of distinct valid signatures is calculated. If the petition is not certified from the first
sample, a second random sample is selected for verification from the remaining
unverified signatures. Certification of the petition is then based on the estimate obtained
from the combined first and second samples. If the petition is not certified before the due
date for submitting signatures, the petitioners are permitted to submit additional
signatures for verification. The certification of the petition is based on the estimate for the
total number of distinct valid signatures in both submissions. A recent Oregon Supreme
Court ruling (Susan Leo and Jonah P. Hymes vs. Phil Keisling, Secretary of State, and the
state of Oregon, 1998) implies that the decision rule should not be biased either for or
against the petitioner. This means the probability of making a correct decision should be
at least 0.50 for any petition.
In Chapter 2, we consider estimators for the number of distinct valid signatures in an
initiative petition. These estimators include a linear unbiased estimator, several biased
linear estimators and one non-linear estimator, based on the jackknife technique. The
3
perfonnance of the estimators is compared with respect to the bias and root mean squared
error (RMSE) for four fully-verified Washington State petitions. Exact expressions for
the bias and RMSE are provided for the linear estimators and simulated random sampling
is used for the non-linear estimator.
In Chapter 3 we evaluate the Oregon decision rule by studying the probability of a
correct decision for certifYing a petition. The decision rule depends on the estimator for
the number of distinct valid signatures submitted. This estimator can be expressed as a
linear function of the estimators for the number of invalid signatures and the number of
duplicates valid signatures. We use multivariate nonnal approximations for the joint
distribution of the estimators of the number of invalid and distinct valid signatures from
the multiple sampling stages to calculate the probability of correct decision. We also
obtain approximations for the probabilities of acceptance from the first sample, and from
the combined first and second samples when there are two submissions. Several
duplication rates and distributions of multiple duplicates are considered for the case of
one submission. In the case of two submissions only single duplication is evaluated.
4
Chapter 2
Estimating the Number of Distinct Valid Signatures
in Initiative Petitions
Ruben A. Smith-Cayama and David R. Thomas
5
2.1 Abstract
In some states, if citizens are dissatisfied with certain laws or feel that new laws are
needed, they can petition to place proposed legislation on the ballot. To be certified for
the ballot, the sponsor of the petition must circulate the complete text ofthe proposal
among voters and obtain signatures of those in favor. Petitions will contain both invalid
and valid signatures. Valid signatures from registered voters can appear more than once.
To qualify a petition as a ballot measure, the total number of distinct valid signatures
collected must exceed a required number. We are considering the case when a simple
random sample of signatures is drawn from the entire petition, and all signatures in the
sample are verified. The problem is to estimate the total number of distinct valid
signatures based on the sample information and the knowledge ofthe total number of
signatures collected in the petition. We consider several linear estimators and one non­
linear estimator. Expressions for the variance of the linear estimators are provided. The
performance ofthe estimators is evaluated using data from several Washington State
petitions that have been completely verified.
2.2 Introduction
Some state constitutions give initiative and referendum power to the people. If
citizens from these states are dissatisfied with certain laws or feel that new laws are
needed, they can petition to propose legislation, either to the legislature or to the ballot.
The sponsor of the petition must circulate the complete text of the proposed legislation
among voters and collect signatures of those in favor.
After signatures are collected they are filed as a petition with the state office in
charge, usually the Secretary of State. The office in charge determines, by some
procedure established by state law, if the petition is certified or not. A petition is certified
6
by state law if the number of distinct valid signatures in the petition is equal to or exceeds
the minimum required.
In this paper, we are considering the case when a petition of known size contains both
invalid and valid signatures. Valid signatures from registered voters can appear more than
once. It is assumed that a simple random sample of signatures is drawn from the entire
petition and all signatures in the sample are verified. Our interest is to estimate the
number of distinct valid signatures in the petition based on the sample information and
the knowledge of the petition size. Many states use this approach including California,
Illinois, Oregon and Washington (Hauser, 1985).
When no invalid signatures are present, the estimation problem reduces to one known
as estimation of the number of classes in a finite population. A class here is equivalent to
a valid signature. Bunge and Fitzpatrick (1993) provided a review of applications and
techniques proposed to estimate the number of classes in finite and infinite populations.
Goodman (1949) showed that the linear unbiased estimator for the total number of classes
in a finite population is unique under the assumption that the sample size is no smaller
than the maximum number of elements in any class. Recently, Haas and Stokes (1998)
proposed non-linear estimators based on the generalized jackknife technique.
Following Goodman's approach, we consider a linear unbiased estimator for the
number of distinct valid signatures in the petition. Several other linear estimators and one
non-linear are also considered. In Section 2 we introduce terminology and notation
pertinent to our problem. The estimators are described in Section 3. Expressions for the
variance of the linear estimators are also provided. In Section 4 we compare the
performance of all estimators, and in Section 5 we give a summary.
7
2.3 Terminology and Notation
After petition signatures are collected, the state elections office reviews each sheet
and removes all the signature pages obtained that do not satisfy state regulations. This
procedure leads to a subset of the total number of signature pages originally collected,
which will be subject to a verification procedure. This subset of signatures is called the
petition here.
Signatures in the petition can be classified as valid (from registered voters) or invalid
signatures, for example: illegible writing, and signatures different from the ones
contained in the registration records. Let N denote the size ofthe petition, and U and M
the unknown number of invalid and distinct valid signatures in the petition, respectively.
Let N j be the number of times the jth distinct valid signature appears in the petition,
j = 1, ... ,M. Therefore, the jth distinct valid signature has N j
-
1 duplicated signatures
in the petition, j = 1, ... , M. We denote by D the total number of duplicates (replicates)
of valid signatures in the petition, which can be expressed as,
D
M
=
2:(Nj
-
1).
j=1
Note that 'duplicate' is used here to describe all signatures by an elector after his or her
first signature. Also,
Fi
is the number of electors with i valid signatures in the petition,
i = 1, ... ,N. Observe that 0 ::;
Fi ::; M,
so that
M
Fi = 2: I (Nj = i)
(1)
j=1
where I (.) denotes the indicator function. Based on Equation (1), we obtain expressions
for N andM
N
N = U
+ 2:iFi
i=1
(2)
8
(3)
From Equations (2) and (3) we can rewrite D as
D
N
=
I)i i=2
(4)
l)Fi.
Assume a sample of n signatures is drawn at random without replacement from the
petition. Let u be the observed number of invalid signatures in the sample and
Ii be the
number of electors in the sample with i valid signatures. Then n can be written as
n= u
n
+ ~iIi.
i=l
2.4 Theoretical Background
From Equations (2), (3), and (4) we have
M=N-U-D
(5)
Since N is known, an estimator for M can be obtained by determining estimators for U
and D. Since an unbiased estimator for U under simple random sampling design is given
by
fJ
=
~ u, our problem reduces to the estimation of D.
2.4.1 Estimators for D
First, the form of the unbiased estimator,
Dunbias,
for D is determined. Let
k = max( N 1 , ... , N M ). Suppose a sample of n (n 2: k) signatures is drawn without
(l)( N-j)
replacement from a petition of size N. Define Pij = i (~y and,
9
and
Cj
j-1
p.
i=2
Pii
for j = 3,4, ... , n.
= (j - 1) - :LCi~'
Then, an unbiased estimator of D is given by,
'"
n
. -",c;i
D unbtas
- D p. i·
(6)
i=2 "
The proof of this result is given in the Lemma 1 of the Appendix. Observe that the
expansion factors, ;;,., for ii, can take positive or negative values. These expansion
"
factors can be very large in absolute value, depending on the petition and sample sizes.
As a result the estimator Munbias, obtained by using
Dunbias in
Equation (5), can be
unreasonable. To avoid this difficulty, we consider alternative linear estimators, which
ignore the valid signatures appearing more than two or three times in the sample,
'"
N(N - 1)
D2 = n(n-1) 12,
D3 = N(N - 1) 12
n(n - 1)
Goodman (1949) proposed
(7)
_
N(N - 1)(N - 3n + 4)
n(n - 1)(n - 2)
h
(8)
D2 for estimating the number of duplicates of classes in a
finite population. The next estimator considered is used by the Washington Elections
Division Office I ,
'"
_ N(N - 1) i
D2+- n(n-1)
Notice that
2+
where
h+ =
Lk
n
(9)
i=2
h+ is the number of electors in the sample with valid signatures appearing
two or more times. A more intuitive estimator is one that replaces
h+ by the total number
of duplicates in the sample
Dd =
I Pamela
N(N - 1) d
n(n - 1)
n
where
d = :L(i - 1)ii.
(10)
i=2
Floyd, Elections Division, Voter Registration Services, Office of Secretary of State, telephone
interview, February 9, 1999.
10
Note that if at most pairs of valid signatures occur in the petition (Fj = 0 for j
~
3) then
the estimators (7-10) are equal to the unbiased estimator, 15unbias. Similarly, 15 3 = 15unbias
when at most triplicate valid signatures occur in the petition (~ = 0 for j
~
4).
When prior information is available, it may be possible to reduce the bias of the
estimators by incorporating a bias adjustment factor (BAF), denoted as
function of q, k, and r, where q = n/ N is the sampling fraction, r
=
B?'k,r,
which is a
(T3, T4,""
Tk)
with Ti = Fd F2 for i = 3, ... , k, and k is the maximum number of times any valid
signature appears in the petition, k = max{j: F j
Dk
_
BD
E(15lq, k)
q,k,r -
where Dk = F2
> O}. The BAF for 15 is defined as,
+ 2F3 + ... + (k - l)Fb and E(15lq, k)
denotes the expectation of 15
given q, and k. The BAF is approximated using binomial sampling with
Pij =
U)qi(l - q)j-i
in Equation (A.2) of the Appendix,
k
1+2:: (i-l)Ti
i=3
k
1+~2::i(i-l)(1-q);T
2(I-q) i~3
'
k
BD
q,k,r -
1+2::(i-l)Ti
i=3
k
l+~ 2::(i+(1+(i-l)q)(1-q)i-l )Ti
for 15 = 152+
q i=3
k
l+I:(i-l)T,
k
i=3
l+~ 2::(iq-1+(1-q)i)T;
q i=3
Then, the adjusted estimator of D is
---
Dadj =
--D
D
Bq,k,r
where 15 is any of the biased estimators defined in Equations (7), (9), and (10). The
binomial approximation give values ofE(15lq, k) which are very similar to those
obtained using the exact distribution, when Nand n are large. The binomial sampling
(11 )
11
approximation was also used by Goodman (1949), and Haas and Stokes (1998). Observe
that the population values k and r are unknown and need to be specified using prior
information. In some states, including Washington, duplication data from previous fully
verified petitions might be used.
2.4.2 Estimators for M
Estimators for M can be obtained by substituting in Equation (5) any of the
estimators for D presented in Equations (6-11)
M=
N -
t
fJ - iJ
iJ = ELAdi
with
(12)
i=2
for constants E, t, and Ai, with
I
E=
{
~
EDq,k,r
for iJ =
iJunbias,
iJ 3, iJ 2, iJ 2+, iJd
for the adjusted estimators.
In petitions, the coefficient of variation for the number of times valid signatures
appear in the petition is expected to be small. The square of this coefficient of variation is
M
(1/ M)'2:)Nj
,
2
j=l
-
N)2
= -----::----
N
2
A second-order jackknife estimator,
for applications where
,2
_
where N
=
M
(1/ M) LNj
j=l
N-U
= -­
M
Muj2 , was recommended by Haas and Stokes (1998)
is relatively small. The following estimator is a modification of
the Haas and Stokes second-order jackknife estimator to accommodate the additional
class of invalid signatures
12
M j2m = ( 1 -
!I (1 -
n*
U
!I (1 -
q*) ) -1 ( n'
q*) In( 1 - q* yy2
(Muj \ )
~!i - ------q-*------'--~
)
where n * = n - u is the reduced sample size obtained after removing all invalid
signatures, and N* = N -
U is an unbiased estimator of the number of valid signatures
in the petition and
q* = n* / N*
MUJ\. =(1 jy2(M)
=
'J/J) ~f'
(l- Q
n'
~
~
i=l
max (0,
~ ~i(i ~=1
l)!i +
f!. - 1).
2.4.3 Expectation and Variance of M
The expected value and variance for any estimator of the general form given in
Equation (12) is obtained as
t
n
E(M) = N - U - BL:Ai L:PijFj
i=2
j=i
Var(M) = Var(U)
+ Var(D) + 2Cov(U, D)
where
'" , D"') = - -'""'A·'""'
EUn ~t ~~n ('N
. ) Po~J F J
Cov (U
~ -In
N-J
i=2
j=i
(13)
(14)
13
(1 +
Vijkl
(P'j.ij - Pi; )FJ - Pij.ij) P.jFj for
( (Pkj.ij (Pkl.ij -
D .. _
r~J-
( ij) (N-j)
n-i
(~)'
P kj ) F j -
Pkj.i j ) PijFj
n
rkl.ij =
~ k,j ~ I
for i oF k, j =
for j oF
Pkl)PijFjF}
an d
i
I
I
( k1)(N-j-l)
n-t-k
(~-f)
The expression for the expected value of the linear estimator M follows from Equation
(12), the unbiasedness of U, and Equation (A.2) of the Appendix. The expression for
Var(U) is well known, and the expressions for Var(15) and Cov(U,15) are derived in the
Appendix.
2.5 Performance of the Estimators
In this section, the estimators for the number of distinct valid signatures, M, are
compared with regard to their bias and root mean squared error (RMSE) for four fully­
verified Washington State petitions, denoted as A, B, C, and D. In Washington, ifthe
random sample indicates that M attains the required number then the measure is certified.
Otherwise, complete verification of the petition is required.
Table 2.1 describes the four petitions with regard to: petition size (N), numbers of
invalid signatures (U), duplicates of valid signatures (D), distinct valid signatures (M),
the number of electors with i valid signatures in the petition (Fi,), and the squared
coefficient of variation,
,,?, for the number of times (Nj ) distinct valid signatures appear
in the petition. Also included is the year that each petition was submitted for verification.
The petition sizes range from 162,324 to 231,723, the proportion of invalid signatures
from 12.0 to 20.4 percent, the duplication rates from 2.0 to 5.6 percent, and the numbers
14
Table 2.1 Description of the petitions A, B, C, and D.
A (1984)
N
162,324
B (1995)
231,723
C (1989)
173,858
D (1996) 228,148
U (%)
19,437 (12.0)
47,383 (20.4)
31,325 (18.0)
34,542 (15.1)
D (%)
4,256 ( 2.6)
4,546 ( 2.0)
9,738 (5.6)
11,584 (5.l)
M(%)
138,631 (85.4)
179,794 (77.6)
132,795 (76.4)
182,022 (79.8)
H(%)
134,489 (82.9)
175,363 (75.7)
123,205 (71.0)
170,988 (74.9)
F 2 (%)
4,031 ( 2.5)
108 (0.07)
4,331 ( 1.9)
93 (0.04)
8,878 (5.l)
385 (0.22)
10,518 (4.6)
489 (0.21)
F3(%)
F4
3
30
6
F5
3
0
0
F6
22
2
H2
,,?
0.0296
0.0252
0.0652
0.0584
of distinct valid signatures from 76.4 to 85.4 percent. The petitions C and D with the
largest percentage of pairs (F2) also have the largest percentage of triplicates (F3) and
quadruples (F4). Only two petitions have electors who signed more than four times,
petition B has one elector who signed twelve times and petition D has two electors who
signed six times, and three electors who signed five times. For all four petitions, the
proportion of electors with triplicates or higher, is small (
< 0.24%). As expected, all
four petitions have small values of "'?
Table 2.2 displays the expected frequency for replications of distinct valid signatures
in the sample for each sampling fraction and petition. For sampling fractions 3%,5%, and
10%, and all four petitions, the expected number of distinct valid signatures that appear
more than twice in a random sample is less than one. When the sampling fraction is
increased to 20%, the expected number of triplicate valid signatures exceeds one only for
petitions B, C and D, and the expected number of quadruples or higher is less than 0.22.
15
Table 2.2 Expected frequency for replications of valid signatures, E(fi)l.
Sampling
Fraction
3%
5%
1,
A
B
C
D
2
3
>4
<
3.93
0.0032
0.0001
4.22
0.0077
0.0003
9.15
0.0135
0.0001
<
10.91
0.0173
0.0001
<
10.89
0.0149
0.0001
11.67
0.0318
0.0023
25.34
0.0624
0.0002
<
30.20
0.0792
0.0001
2
3
24
<
10%
2
3
24
43.37
0.1188
0.0003
46.34
0.1998
0.0262
100.63
0.4929
0.0030
119.87
0.6216
0.0061
20%
2
3
24
172.02
0.9407
0.0050
183.37
1.1338
0.2149
396.69
3.8479
0.0480
472.15
4.7925
0.0893
n
IEUi) = 'LP;jFj.
j=i
To calculate the bias adjustment factors, B?'k,r, we need to specify k and Ti = FilF2
for i = 3, ... ,k, where q =
n/ N. When sampling is used the values of k and
Ti,
i = 3, ... , k are unknown. Here, we apply a jackknife approach where for each petition,
i = A, B, C, D, information from only the remaining three petitions is used to specify
values for the unknown k and T3,
... ,Tk.
For each petition, the specified value,
k, was
determined as the maximum of the observed k-values from the other three petitions.
Similarly, the specified vector,
r , is calculated as the average ofthe known entries for the
other three petitions. Table 2.3 gives the true and specified values for k, and r for each
petition.
16
Table 2.3 True values of k and r3, ... ,rk, and specified values
A
k
r3
r4
rs
r6
r12
True
4
0.0268
0.0007
0
0
0
Note: The entries of r
B
k and r3,···, fie.
D
C
True
True
Spec
Spec
12
4
12
6
0.0215 0.0389
0.0434 0.0316
0.0014 0.0021
0.0034 0.0014
0.0001
0
0.0001
0
0
0.0001
0
0.0001
0.0002 0
0.0001
0
= (r3, ... ,ru) and r = (f3 , ... ,rI2) not displayed are equal to zero.
Spec
12
0.0371
0.0023
0.0001
0.0001
0.0001
Table 2.4 gives values for the bias adjustment factors,
True
6
0.0465
0.0465
0.0003
0.0002
0
Bf,k, r' using k =
Spec
12
0.0306
0.0018
0
0
0.0001
3 and 12 for
each petition, estimator, and sampling fraction (q): 3%, 5%, 10%, and 20%. From Table
2.4, we can see that the values of the BAF corresponding to r = f3 and
r=
(f3, ... ,r12) are similar in all cases. Therefore, we consider only bias adjustment
based on triplicate valid signatures, r = f3, hereafter.
For each linear estimator, we use Equations (13) and (14) to compute the bias and root
mean squared error (RMSE)
Bias(M)
= E(M) -
For the nonlinear estimator,
M
and
RMSE
= Jvar(M) + {Bias(M)}2.
Muj2m, we estimate the bias and RMSE from
10,000
independent simulated random samples, drawn without replacement from each petition.
17
B?,
Table 2.4 Specified values ofthe bias adjusted factor,
k,r' for each petition, adjusted
estimator and sampling fraction (q): 3%,5%, 10%, and 20%.
D
A
B
C
k=3 k = 12
k=3 k = 12
k=3 k = 12
k = 3 k = 12
q
Estimator
3%
152adj
0.970 0.960
0.968 0.962
0.974 0.967
0.974 0.966
152+adj
0.969 0.958
0.967 0.961
0.974 0.966
0.973 0.965
15dadj
0.968 0.957
0.966 0.960
0.973 0.964
0.972 0.963
152adj
0.971 0.963
0.970 0.965
0.976 0.970
0.975 0.969
152+adj
0.970 0.961
0.969 0.963
0.975 0.967
0.974 0.967
15dadj
0.968 0.958
0.967 0.961
0.973 0.965
0.973 0.964
152adj
0.976 0.971
0.975 0.971
0.980 0.976
0.980 0.976
152+adj
0.973 0.966
0.972 0.967
0.977 0.972
0.977 0.971
15dadj
0.970 0.961
0.969 0.963
0.975 0.967
0.974 0.966
152adj
0.986 0.985
0.986 0.984
0.989 0.988
0.988 0.987
152+adj
0.980 0.976
0.979 0.976
0.983 0.980
15dadj
0.973 0.966
0.972 0.967
0.977 0.972
0.982 0.979
0.977 0.971
5%
10%
20%
In Tables 2.5 and 2.6, the bias and RMSE are given for the nine estimators of M for
Petitions A-D and sampling fractions: 3%, 5%, 10%, and 20%. For the adjusted
estimator, Equation (11), k
=
3(r
=
f3) is used for the bias adjusted factor, B?,k,r'
In Table 2.5, the estimator M3 tends to have a relatively small positive bias
( < 0.07%) in all cases. The biases of M2, M2+, and Md are negative in all cases,
corresponding to positive biases in the estimators for the number of duplicates of valid
signatures 152 ,152+, and 15d . Note that the difference between the bias of these estimators
tend to increase as the sampling fraction increases. This is expected since the number of
triplicate and quadruple valid signatures increases with sample size (Table 2.2). The three
adjusted estimators show a sma11 reduction in the absolute bias when compared with their
18
Table 2.5 BIAS (%) ofestimators for M.
Sampling
Fraction Estimator
3%
MUnbias
M3
M2
M2+
Md
M 2adj
M 2+adj
M dadj
M uj2m
5%
Munbias
M3
M2
M2+
Md
M 2adj
M 2+adj
M dadj
M uj2m
10%
Munbias
M3
M2
M2+
Md
M 2adj
B
C
D
0
3 (0.00)
-106 (-0.08)
-110 (-0.08)
-113 (-0.08)
0
120 (0.07)
-138 (-0.08)
-147 (-0.08)
-156 (-0.09)
0
27 (0.02)
-430 (-0.32)
-445 (-0.34)
-460 (-0.35)
0
46 ( 0.03)
-535 (-0.29)
-554 (-0.30)
-574 (-0.32)
27 (0.02)
27 (0.02)
28 (0.02)
11 (0.01)
7 (0.00)
3 (0.00)
-162 (-0.12)
-168 (-0.13)
-174(-0.13)
-225 (-0.12)
-234 (-0.13)
-243 (-0.13)
-2,536 (-l.83)
-2,768 (-l.54)
-5,988 (-4.51)
-7,541 (-4.14)
0
2 (0.00)
-99 (-0.07)
-105 (-0.08)
-Ill (-0.08)
0
94 (0.05)
-122 (-0.07)
-136 (-0.08)
-150 (-0.08)
0
24 (0.02)
-400 (-0.30)
-425 (-0.32)
-450 (-0.34)
0
42 (0.02)
-497 (-0.27)
-529 (-0.29)
-561 (-0.31)
25 (0.02)
26 (0.02)
28 (0.02)
17(0.01)
12 ( 0.01)
5 (0.00)
-150 (-0.11)
-160 (-0.12)
-170 (-0.13)
-208 (-0.11)
-222 (-0.12)
-237 (-0.13)
-2,609 (-l.88)
-2,825 (-1.57)
-6,004 (-4.52)
-7,598 (-4.17)
0
2 (0.00)
-81 (-0.06)
-93 (-0.07)
-105 (-0.08)
0
52 (0.03)
-88 (-0.05)
-Ill (-0.06)
-137 (-0.08)
0
20 (0.02)
-325 (-0.24)
-375 (-0.28)
-425 (-0.32)
0
32 (0.02)
-403 (-0.22)
-466 (-0.26)
-529 (-0.29)
21 (0.02)
26(0.01)
-120 (-0.09)
-166 (-0.09)
M 2+adj
24 (0.02)
26 (0.02)
20(0.01)
11(0.01)
-140 (-0.11)
-160 (-0.12)
-194 (-0.11)
-223 (-0.12)
M Uj2m
-2,346 (-l.69)
-2,555 (-l.42)
-5,593 (-4.21)
-7,054 (-3.88)
0
1 (0.00)
-46 (-0.03)
-69 (-0.05)
-93 (-0.07)
0
18(0.01)
-38 (-0.02)
-72 (-0.04)
-114 (-0.06)
0
13(0.01)
-179 (-0.13)
-277 (-0.21)
-375 (-0.28)
0
20(0.01)
-220 (-0.12)
-342 (-0.19)
-466 (-0.26)
13 ( 0.01)
18 (0.01)
24 (0.01)
27 (0.02)
26 (0.01)
17(0.01)
-63 (-0.05)
-101 (-0.08)
-140 (-0.11)
-85 (-0.05)
-139 (-0.08)
-194 (-0.11)
-1,979 (-l.43)
-2,139 (-1.19)
-4,820 (-3.63)
-6,059 (-3.33)
Mdadj
20%
Petitions
A
MUnbias
M3
M2
M2+
Md
M 2adj
M 2+adj
Mdadj
M uj2m
19
non-adjusted counterparts. The non-linear estimator,
Muj2m, tends to have a relatively
large negative bias ranging from -4.52% to -1.19%.
From Table 2.6, it can be seen that the RMSE decreases at a faster rate than
1/ fo for
all estimators and petitions. This results from corresponding property ofthe estimators for
D in Equation (12). The estimator M3 has smaller RMSE than Munbias, except for the
20% sampling fraction for petition A where the RMSE's are equal. The estimator M2 has
smaller RMSE than M3, except for the 20% sampling fraction for petitions is C and D.
The estimators M2,
M2+, and Md tend to have similar RMSE's for the sample fractions
of 3%,5%, and 10% over all four petitions. This is as expected from the form ofthe
estimators and the very small expected number of triplicate or higher replications of
distinct valid signatures (Table 2.2). For the 20% sampling fraction, the RMSE for
slightly larger than the RMSE's for
Md is
M2 and M2+ for petitions B and C, and similar for
petitions A and B. The adjusted estimators
M2adj , M2+adj, and Mdadj show a slight
reduction in the RMSE compared to their non-adjusted counterparts. These three adjusted
estimators have similar RMSE's in all cases. The RMSE for the non-linear estimator
Muj2m is relatively large in all cases.
2.6 Summary
In this paper we compared several estimators for the number of distinct valid
signatures in a petition. Explicit forms for the bias and RMSE were provided for the
linear estimators. Simulated random samples were used to estimate the bias and RMSE of
the non-linear estimator,
Muj2m, adapted from Haas and Stokes (1998).
Small sampling fractions less or equal to 10% are typically used for sampling state
petitions. For these sample sizes it was difficult to improve much on the Goodman-type
estimator
M2, which is unbiased when valid signatures are duplicated at most once. This
20
Table 2.6 RMSE (%) of estimators for M.
Sampling
Fraction Estimator
3%
MUnbias
M3
M2
M2+
Md
1,147,904,315
3,873
2,510
2,512
2,519
(638,455)
(2.15)
(1.40)
(1.40)
(1.40)
7,502
5,217
3,462
3,466
3,476
(5.65)
(3.93)
(2.61)
(2.61)
(2.62)
D
49,503
5,808
3,791
3,796
3,807
(27.20)
( 3.19)
( 2.08)
( 2.09)
( 2.09)
Mdadj
3,354 (2.53)
3,354 (2.53)
3,359 (2.53)
3,669 ( 2.02)
3,670 ( 2.02)
3,675 ( 2.02)
M uj2m
3,943 (2.84)
4,290 (2.39)
7,266 (5.47)
8,859 ( 4.87)
MUnbias
1,740
1,645
1,423
1,424
1,427
M3
M2
M2+
Md
(1.26)
(1.19)
(1.03)
(1.03)
(1.03)
47,190,341
1,995
1,585
1,588
1,594
(26,247)
(1.11)
(0.88)
(0.88)
(0.89)
3,231
2,693
2,123
2,130
2,142
(2.43)
(2.03)
(1.60)
(1.60)
(1.61)
10,607
2,979
2,331
2,341
2,356
(
(
(
(
(
5.83)
1.64)
1.28)
1.29)
1.29)
Mdadj
1,385 (1.00)
1,384 (1.00)
1,384 (1.00)
1,546 (0.86)
1,546 (0.86)
1,549 (0.86)
2,044 (1.54)
2,044 (1.54)
2,049 (1.54)
2,238 ( 1.23)
2,239 ( 1.23)
2,246 ( 1.23)
M uj2m
3,353 (2.42)
3,653 (2.03)
6,724 (5.06)
8,326 ( 4.57)
1,232
1,179
1,116
1,133
1,157
1,726
1,294
1,234
1,258
1,290
M2adj
M 2+adj
MUnbias
M3
M2
M2+
Md
M 2adj
M 2+adj
Mdadj
M uj2m
20%
(2.49)
(2.16)
(1.66)
(1.66)
(1.67)
C
2,441 (1.36)
2,441 (1.36)
2,445 (1.36)
M 2+adj
10%
3,449
2,998
2,307
2,308
2,311
B
2,242 (1.62)
2,241 (1.62)
2,241 (1.62)
M 2adj
5%
Petitions
A
Munbias
M3
M2
M 2+
Md
M 2adj
M 2+adj
M dadj
M uj2m
795
787
753
755
759
(0.57)
(0.57)
(0.54)
(0.54)
(0.55)
532,313
928
876
879
887
(296)
(0.52)
(0.49)
(0.49)
(0.49)
(0.93)
(0.89)
(0.84)
(0.85)
(0.87)
(
(
(
(
(
0.95)
0.71)
0.68)
0.69)
0.71)
736 (0.53)
735 (0.53)
735 (0.53)
860 (0.48)
859 (0.48)
861 (0.48)
1,056 (0.80)
1,058 (0.80)
1,065 (0.80)
1,159 ( 0.64)
1,163 ( 0.64)
1,173 ( 0.64)
2,805 (2.02)
3,077 (1.71)
6,053 (4.56)
7,535 ( 4.14)
405
405
404
408
415
(0.29)
(0.29)
(0.29)
(0.29)
(0.30)
4,129
498
496
500
509
(2.30)
(0.28)
(0.28)
(0.28)
(0.28)
563
561
578
616
671
(0.42)
(0.42)
(0.44)
(0.46)
(0.51)
623
613
640
692
767
(
(
(
(
(
0.34)
0.34)
0.35)
0.38)
0.42)
399 (0.29)
398 (0.29)
398 (0.29)
492 (0.27)
491 (0.27)
491 (0.27)
549 (0.41)
553 (0.42)
565 (0.43)
602 ( 0.33)
611 ( 0.34)
629 ( 0.35)
2,359 (1.70)
2,571 (1.43)
5,211 (3.92)
6,475 ( 3.56)
21
results from the very small probability of observing higher duplicate replication from
typical petitions. When duplicate replication data is available from similar fully-verified
petitions, it is possible to reduce the bias of the (biased) linear estimators.
2.7 References
Bunge, J. and Fitzpatrick, M. (1993), Estimating the Number of Species: A Review,
Journal ofthe American Statistical Association, 88, 364-373.
Goodman, L. A. (1949), On the Estimation of the Number of Classes in a Population,
Annals ofMathematical Statistics, 20, 572-579.
Haas, P. J. and Stokes, L. (1998), Estimating the Number of Classes in a Finite
Population, Journal ofthe American Statistical Association, 93,1475-1487.
Houser, J. (1985), Validating Initiative and Referendum Petition Signatures, Research
Monograph, Legislative Research, S420 State Capitol, Salem, Oregon.
22
2.8 Appendix
23
Calculation ofE(D), Var(D), and Cov(U,D)
Consider a random sample of n signatures drawn without replacement from a petition
of size N. Let 8ja denote the number of valid signatures in the sample from the ath
elector who signed j valid signatures in the petition, for a = 1,2, ... ,Fj , and
j
=
1, ... , N. Note that 8ja has the hypergeometric (N, n, j, i) with ~j
= P(8ja = i)
given by
for i = 0, 1, ... ,j.
Similarly, the conditional distribution of 81(3, given 8ja = i, is hypergeometric
(N - j, n - i, l, k) with Pk1 .ij
TJ
rkl.ij =
= P(81(3 = k I 8ja = i)
( kI) (N-j-l)
n-t-k
given by
for k = 0, 1, ... , l.
(N-j)
n-t
For the number of electors in the sample with i valid signatures, Ji, write
Pj
n
Ji = 'LJij
j=i
Jij = 'L 1 (8ja = i),
with
a=l
where Jij is the number of electors with i signatures in the sample and j signatures in the
petition (i ::; j) and I (-) is the indicator function. Note that Jij is not observable, but Ji
is. Then,
n
and
E(fi) = 'LPijFj.
j=i
/'-
(A. I )
t
Thus, from the general form of the linear estimator D = BLAdi we have
i=2
E(D)
=
t
n
i=2
J=t
BLAi L~jFj.
(A.2)
24
Lemma 1 Let k = max( N 1 , ••• , Nf,J). Suppose a sample of n (n 2: k) signatures is
drawn without replacement from a petition of size N. Define
C2
=
1,
and
Cj
= (j -
L
R
j-l
1) -
i=2
Ci 2 ,
~i
for j
= 3, 4, ... , n.
Then, an unbiased estimator of D is given by
(A.3) Proo f.
The unbiasedness property for Dunbias follows from substitution of the
expectation for fi in Equation (A.3).
=
=
n
"LFj(j -1)
j = 3,4, ... ,n
j=2
N
"L(j-1)Fj
j=2
since F j = 0, for j = k
+ 1, k + 2, ... , Nand n 2:
k
=D.
The next result is used for the calculation ofVar(D) and Cov(U,D).
Lemma 2 The Cov(fij,ikl) = Vijkl, i ~ j, k ~ l, where
(1 +
Vijkl =
(F';j.ij - l';j)Fj - F';j.ij) l';jFj for i = k,j =
( (Pkj.ij - Pkj ) Fj - Pkj.ij)
(Pkl .ij - Pkl ) PijFjPz
~jFj
for i
=1= k,
for j
=1=
l
I
j =l
25
Proo I.
Substitute (A. 1) in
Cov(fij, Ikl) = E(fijIkl) - E(fij)E(fkZ)
Pj
=
Pi
L L P(Oja =
a=l(3=l
i, OZ(3 = k) - (~jFj)(PkIFI) 1. Case where i = k, j = l
COV(fij, Iij)
Pj =L
a=l
2. Case where i
Cov(fij, Ikj)
P(Oja
=I=-
(
Pj
a=l (3=1
(3o/=a
P(Oja
= i, OJ(3 = i) -
Pj
Pj
= 0+L L
a=l (3=1
(3o/=a
P(Oja
=
i, OJ(3
= k) -
(~jFj)(PkjFj)
- 1) - ~jPkjFJ
(Pkj .ij - P kj ) Fj - Pkj.ij) PijFj .
3. Case where j
Cov(fij, hi)
=I=-
Pj
(PijFj
k, j = l
= ~jPkj.ijFj(Fj
=
Pj
= i) + L L
l
Pi
=L L
a=l(3=l
P(Oja
= (Pkl .ij - Pkl)PijFjF}.
= i, 01(3 = k) -
(~jFj)(PklF})
?
26
t
~
The variance for the general fonn of the linear estimator D = B'L-Adi,
i=2
t
n
t
n
Var(J5) = B2L LAiAk L LVijkl,
i=2 k=2
j=i I=k
then follows from the covariance of the sums fi
=
n
'L-fij and fk
j=i
=
n
'L- fkl and Lemma 2.
I=k
Lemma 3 Under the assumption F j = 0 for j > n,
(U)
j
N P;·F
(iNN -_ jn)
Cov(u , f··)
= ~J
Proof.
U
=n
~J
J
For fixed i and j write
n
- 'L-kfk
k=l
= n -
-
n
=n
1
'L- 'L- kfkl -
1=1k=1
Ifj
n
n
- 'L-k'L-fkl
k=l I=k
ifij -
=n
n
-
1
j
'L- 'L- kfkl -
'L-kfkj
k=1
1=1 k=l
Ifj
j
'L- kfkj.
k=l
kii
Then,
Cov( U, fij) = -
n
1
'L- 'L- kCov(ikl, fij)
l=lk=1
Ifj
j
- iVar(fij) -
'L- kCoV(fkl, fij).
k=1
kii
From Lemma 2,
Cov( U, fij) = -
~ ~ k(Pkl ·· -
L.. L..
1=1 k=1
Ifj
-
.~J
Pkl)PFFl
~J
J I-
~k((PkPJ
... -
L..
k=1
kii
i(l
+
(p ... - p..)p ~J.~J
Pk·)p
..)P.P
J
J - Pk·J.~J
~J J
~J
J
p,~PJ
... )P.P
~J J
27
-(f t
+ to (P
-(i -t
1=1 k=O
k ( Pkl.ij - Pkl) Fl
lh
k(
kN -
(1 +
(Pij.ij - Pij ) F j -
~j.ij)
Pk )) Fj - Pkj;j) ) F';jFj
k#i
k=O
+i
kPkj.ij +
f (tk=O k (Pk1 .ij -
1=1
Pk1 )) Fl)
~jFj.
Using the expectation of hypergeometric distributions then gives the reduction
Cov( u, f;)) = -
(i -jY:=;) + E('Y:=;) - ~) F.) F';jF)
- (iN-jn)
N-J
(1 -
1..
~lF.l) p.p.
N ~
~J J
1=1
- C~~jn) (!ft)PijFj.
~
From Lemma 3, the covariance of U =
then
~
/f;u and D =
t
t
n
i=2
i=2
j=i
BL:Adi = BL: L:Adij is
28
Chapter 3 Sampling for Certification of Oregon State Initiative Petitions
Ruben A. Smith-Cayama and David R. Thomas
29
3.1 Abstract
Statistical sampling for verification of signatures from initiative petitions is used in
some states to determine ifthe number of distinct valid signatures, M, in the petition
attains a required number, M 2': R. This determination is based entirely on sampling in
three states: Michigan, North Dakota, and Oregon (Hauser, 1985). In Oregon, the
decision rule is complicated in that multiple stages of sampling are used. A maximum of
two submissions of signatures are permitted, with two samples selected sequentially from
the first submission and one sample from the second submission. The petition can be
accepted after any sampling stage, but only rejected after verification of all samples.
The decision rule depends on estimates for the number of distinct valid signatures
submitted. This is accomplished by subtracting from the number of signatures submitted
estimates for the numbers of invalid signatures (U) and duplicates (D) of valid
signatures: M = N - U - D. By a court ruling, the decision rule shall not be biased
either for or against the petitioner. In terms of a statistical decision rule, this corresponds
to the property that the probability of a correct decision, regarding M 2': R or M < R,
should be at least 0.5.
The performance of the decision rule is investigated in a variety of situations. Several
rates of invalid and duplicates of valid signatures, and sampling fractions are considered
for different petition sizes. Cases for both single and multiple duplication of valid
signatures are used for single submissions. Only single duplication of valid signatures is
considered for two submissions.
3.2 Introduction
The Oregon constitution gives citizens the power to propose legislation by initiative
petitions. The sponsors of the petition must circulate the complete text of the proposed
30
legislation among voters and collect signatures. Signatures in the petition are classified as
either valid or invalid. A valid signature must match that on the voter registration in the
county designated on the signature sheet. Although the petition specifies that an elector
can sign the petition at most once, some electors sign two or more times.
The signatures sheets for a petition are submitted to the Oregon Secretary of State
Office, which detennines whether the petition is certified, or not, as a measure on the
ballot. A petition should be certified if the total number of distinct valid signatures
collected, exceeds a specified required number. For statutory and constitutional initiative
petitions, this required number corresponds to six and eight percent of the total votes for
the State Governor in the previous election, respectively. Currently, based on the 1998
election the required numbers are 66,786 and 89,048. Certification of a referendum is
similar to that for an initiative petition but only signatures of four percent of votes for
governor are required.
The Secretary of State must give the results of the certification procedure within 15
days after the due date. Ifthe petition or referendum is submitted early then the
certification must be completed as early as possible. These time constraints necessitates
the use of statistical sampling for selection of signatures to be verified. The signatures are
verified in stages. First, a sample of size nl = 1,000 is selected at random from the N
signatures submitted in the petition. If the petition is not certified from this first sample
then a second sample of size n2 is randomly selected from the remaining N -
nl
signatures in the petition. The second sample is chosen so that the combined sample size,
n
=
nl
+ n2, is equal to or greater than five percent of the petition size (n
2: 0.05N).
This choice of sample size will satisfy the restriction, imposed by the Oregon
administrative rules, that the second sample size, n2, must be larger than the first sample
size, n 1. Note that the combined first and second samples satisfy the property of simple
random sampling: every possible sample of size n, selected from N, has the same
probability of being selected. By Oregon administrative rules a petition cannot be rejected
31
unless both samples have been verified. Further, the statistical decision rule used for
determining if the petition contains the required number of distinct valid signatures
should not be biased either for or against the petitioner.
The decision rule depends on estimates for the number of distinct valid signatures
submitted. To estimate the number of distinct valid signatures, estimators for the number
of invalid and number of duplicates of valid signatures are needed. For the purpose of
constructing a lower limit for the number of distinct valid signatures contained in the
petition from the first sample, an upper bound of at least eight percent is assumed for the
duplication rate of valid signatures. If verification of the second sample is required, an
estimator for the number of duplicates of valid signatures is obtained from the combined
first and second samples. If the petition is not certified from this combined sample and no
more signatures are submitted before the due date then the petition is rejected. If
additional signatures are submitted as a second submission before the due date then a
simple random sample of signatures is drawn from the second submission for
verification. The certification of the petition is then based on estimates made from the
samples in the first and second submissions.
We study the performance ofthe decision rule for several situations. The probability
of a correct decision is approximated by a multivariate normal distribution model for the
joint sampling distribution of the estimators of the numbers of invalid and distinct valid
signatures from the multiple stages. Several rates of invalid and duplicates of valid
signatures and sampling fractions are considered for different petition sizes. Various
duplication rates and distributions of multiple duplication of valid signatures are used for
the case of a single submission. Only single duplication is considered for two
submissions. Also evaluated are the probabilities of acceptance from the first sample and
from the combined first and second samples for two submissions.
32
3.3 The Statistical Decision Problem
To describe the statistical estimates used in the Oregon decision rules for certification
of an initiative petition, it is convenient to introduce the following notation for counts in a
petition. Let U and D denote the unknown numbers of invalid and duplicates (replicates)
of valid signatures. Note that 'duplicate' is used here to describe all signatures by an
elector after his or her first signature. Then, the unknown number of distinct valid
signatures is given by
M=N-U-D.
(1)
The first subtraction, N - U, gives the number of valid signatures, including duplicates,
and the second substraction eliminates the duplicates of valid signatures.
The objective is to determine, from statistical estimates, if the total number of distinct
valid signatures in the petition attains the required number, M
~
R, where R denote the
known required number of signatures. A statistical estimator for M can be obtained from
Equation (1) by substituting estimators for the number of invalid signatures and for the
number of duplicates of valid signatures.
Ifthe petition consists of two different submissions (strata) then it is convenient to
use the subscript F and S to designate the first and second submissions. For the counts in
each submission (F, S), denote by N h , Uh , and Dh the total number of signatures, the
unknown numbers of invalid, and duplicates of valid signatures in the hth submission,
respectively, for h = F, S. Let D FS be the unknown total number of duplicates of valid
signatures between the first and second submissions. Note for the petition comprised of
both submissions that the total number of distinct valid signatures can be written as
M = MF
+ Ms -
D FS
where Mh = Nh - Uh - Dh is the number of distinct valid signatures in the hth
submission, for h = F, S. A statistical estimator for M can be obtained from Equation
(2)
33
(2) by substituting estimators for the number of distinct valid signatures in each
submission and the number of duplicates of valid signatures between the first and second
submission.
3.4 One Submission of Signatures
Most petitions are filed as a single submission of signatures for Oregon. The decision
rule for a single submission is described in Section 3.4.1, an approximation for the
probability of correct decision is developed in Section 3.4.2, and numerical results are
given in Section 3.4.3.
3.4.1 The Decision Rule
The decision rule (1999 Oregon Administrative Rules, Chapter 165) has two
components. After verification of the signatures in the first sample the decision
alternatives are to either accept (certify) the petition or verify the signatures in the second
sample. After the second sample the alternatives are to either accept or reject the petition.
The petition is accepted based on the first sample only ifthere is a high level of
confidence that M 2: R, where M is defined in Equation (1). This is accomplished by
using a lower bound, denoted as
MIL, for the number of distinct valid signatures in the
petition
(3)
where
UlU = UI + 1.645N
(4)
34
UI
UI
= NUl,
is the proportion of invalid signatures in the first sample, and
UIU is the 95% level
upper confidence limit for the number of invalid signatures. The upper limit for the
number of duplicates of valid signatures in the petition
DIU = NDIU
is obtained by assuming an upper bound for the duplication rate of at least 0.08
(DIU
2: 0.08). If MIL 2: R then the petition is accepted as a measure on the ballot.
Otherwise, the second sample must be verified.
For petitions that require verification of the second sample, data from the combined
sample of size n =
nl
+ n2 are used for estimating the number of distinct valid
signatures
(5)
The unbiased estimator for the number of invalid signatures is
N
~
U=-u
n
(6)
where u is the number of invalid signatures in the combined sample. Note that the
number of duplicates of valid signatures can be written as
D
N
=
I)i -l)Fi
i=2
where Fi is the number of electors with i valid signatures in the petition, i = 1, ... ,N.
Then, an estimate for D is given by
~
D=
where d =
n
L (i -
i=2
N(N - 1)
n(n - 1)
d
1) Ii and Ii is the number of electors with i valid signatures in the
combined sample. Smith-Cayama and Thomas (1999) compared the performance of
(7)
35
several estimators for M with respect to the bias and root mean squared error for random
sampling from four fully-verified Washington state petitions. The estimator £1, referred
to
£1 d in their paper, was found to have performance similar to the best estimator in each
case.
The estimator £1, obtained from the combined first and second samples, is now used
for determining the certification of the petition. If £1
2': R, the petition is certified as a
measure on the ballot. If no more signatures are submitted before the due date and
£1 < R, then the petition is rejected.
3.4.2 Probability of Correct Decision
It is of primary interest to determine the probability of making the correct decision as
to whether M
< R or M 2': R. The petition can only be rejected if from the first sample
£1 lL < R and from the combined sample £1 < R. Denote this rejection probability as
and the probability of correct decision as
for
for
M <R
M> R.
(8)
To satisfy the constraint that the decision rule should not be biased either for or against
the petitioner, the probability of correct decision should be greater or equal to 0.5, that is,
PeD 2': 0.5,
for all U
= NU
and D
= N D.
It can be shown that the events
equivalent when
£1 lL < Rand fJ 1 > kl for the first sample are
36
A = ( 1.645 )
h
were
2 N-nl
N(nl-I)
B
+ 1, = 1-
A-
2VC, and
C = (1 - DIU - R/N)2,
so that
Using the bivariate nonnal approximation for the joint sampling distribution of Uland
M, the rejection probability is
PR =
1
00
[boo ¢(Zl, z2)dz2 dz l
(9)
where ¢(Zl' Z2) is the standardized bivariate nonnal with correlation coefficient
Cov(UI,M)
PI2 = SD(U I)SD(M)'
(10)
and the standardized lower and upper integration limits are
a=
kl - U
SD(Ut)
and
b=
R - E(M)
-"
SD(M)
(11)
where
(
N-nl) U(I-U)
nl
N-I
- U
(12)
N2(N I) n
n
- ~(i -1)~(P.·11
n(n-I) !--t
~.
f).
f=2
and
P.kl· . .1) -
)=f
- p..)F1))
(13)
( kI) (N-J-l)
n-j-k
(Nn-,
J)
It is known that the distribution of U1 is asymptotically nonnal. In a simulation study we
found that the marginal distribution of M appeared to be approximately nonnal.
37
The derivation of Cov( UI,M) is given in Section 3.8.1 in the Appendix. Expressions
for E(M) and Var(M) are included in Smith-Cayama and Thomas (1999), who consider
a more general class of linear estimators,
E(M) = N - U - N(N - 1) t ( i - 1) tPijFj
n(n - 1) i=2
j=i
(14)
and
Var(M) = Var(U)
+ Var(15) + 2Cov(U, 15)
(15)
where
Var(U) = ':: (~=~ )u (1
- U)
t;
") _ {N(N-l)}2 n n .
n(n-l)
t:2(z - 1)(k -
Var ( D
n
ijkl
1)];n "ft
cov(U , 15) = - N(N-l)
(iN-j,,:)
Fn(n-l) Q.~(i
n ~ - 1)~
~.
N-J p.
tJ J
t=2
J=t
(1 +
and Vijkl
=
(Pi;.,; - Pi;) F; - Pi;'i;) P;;F; for
(( Pkj.ij - Pkj ) Fj - Pkj.ij) PijFj
(Pkl.ij -
Pkl)~jFjFl
i
~ k,j ~ I
for i
=1=
k, j = 1
for j
=1=
1
It is also of interest to calculate the probability of acceptance from the first sample,
PIA
= P(M IL ~ R) =
P(U I ~ kt). For this calculation, we use a normal
approximation for the sampling distribution of UI
PIA = <P(a)
where <P is the standard normal distribution and the standardized value a is given in
Equation (11).
(16)
38
3.4.3 Numerical Results
We calculated the probabilities that the petition will be accepted from the first sample
(PIA) and that the correct decision for certification will be made (PCD ). The rejection
probabilities (PR ) in Equation (9) were computed using the CDFBVN function of the
GAUSS (1992) software. From Equations (9-15) we can see that PCD depends on the size
of the first sample, nl; the upper limit for the rate of duplicates of valid signatures for the
first sample, DIU; the size ofthe combined sample, n; the petition size, N; the number of
invalid signatures in the petition, U; the required number of distinct valid signatures, R;
and the number of electors with i valid signatures in the petition,
fix ni
=
1,000; DIU
Fi, for i
= 1, ... ,N. We
= 0.08 and R = 89,048; corresponding to constitutional petitions
for the years 2000 and 2002; three typical petition sizes, N = 110,000; 120,000; and
130,000; and vary the other parameters.
3.4.3.1 Sin2le Duplication of Valid Simatures
We first consider the case where valid signatures are duplicated at most once, Fi = 0,
for i
~
3. That is, valid signatures are either unique or occur in pairs. Then the number of
duplicates of valid signatures is D = F2 and the estimator reduces to
-"
N(N - 1)
D=
d
n(n -1)
where d =
h
is the number of duplicates of valid signatures in the combined sample.
In the case of single duplication, the sampling distribution moments which depend on
duplication, reduce to:
Cov(UI,M) = -
E(M) = M
N2(~ - I)U[(~-~) -
;!?2]
(17)
39
Var(D) =
.--...
N2(N-I)
n(n-I)
.--...
_
Cov(U, D) -
- 2
D(1 _
n(n-I) D
(N-l)
+ (N-2)(N-3)
(n-2)(n-3) (ND -
1))
N2(N-n)-n(N-2) U D.
These simplified expressions can be used in Equations (10) for the correlation coefficient
PI2
and (11) for the upper integration limit b.
Table 3.1 displays values for the probability of accepting the petition from the first
sample
(PIA)'
and the probability of making a correct decision (PeD ). For each petition
size considered, several values are taken for the number of distinct valid signatures,
defined as M
= f R, where f =
0.97,0.98,0.99, 1.00, 1.01, 1.02, and 1.03; for the
proportion of duplicates of valid signatures D = 0.01, 0.02, 0.04, 0.06, 0.08, and 0.10;
and for combined sample size n = 5, 6, 7, 8, 9, and 10 percent of N. The rate of invalid
signatures in the petition, U, was then taken to satisfy the equation
fR=
N(1 - D - U).
In Table 3.1, for the cases with f = 1 (M = R) the probability of correct decision
(PeD ) is approximately equal to 0.50, except for the cases where D = 8 or 10%. For
D
= 8% the maximum is PeD = 0.535 for N =
n/ N
= 5% to 10%. In such cases with f =
110,000 and all sampling fractions
1 and PeD > 0.5 the decision rule is biased
in favor of the petitioner. For D = 10%, where the assumed upper bound for the
duplication rate for the first sampling stage (DIU = 8%) is exceeded, the bias favoring
the petitioner is relatively large and even extends to
N
=
110,000; n/ N ::; 7% for N
For fixed
=
f
= 0.99 when n/ N ::; 9% for
120,000; and n/ N ::; 6% for N
=
130,000.
f < 1 (M < R) the PeD decreases as D increases (U decreases). For fixed
f > 1 (M > R) the PeD
first decreases as D approaches the assumed upper bound for
duplication rate (DIU) of 8% , and then tends to increase as D exceeds 8%. The increase
in the P eD appears to be due to increasing PIA. That is, the probability of accepting from
the first sample,
PIA,
increases as U decreases. Thus, for fixed M
< R petitions with
40
Table 3.1 Probabilities of acceptance from the first sample (PIA) and correct decision
(PeD ) for single duplication (D = F2)'
a. N
= 110,000
fa
D
U
P 1A
nl
= 1,000
n/N = 5%
PeD
6%
7%
8%
9%
10%
0.97 1%
0.97 2
0.97 4
0.97
6
0.97
8
0.97 10
20.5%
19.5
17.5
15.5
13.5
11.5
0.000
0.000
0.000
0.000
0.000
0.026
0.999
0.993
0.971
0.945
0.920
0.874
1. 000
0.998
0.988
0.972
0.954
0.911
1.000
1.000
0.995
0.987
0.975
0.937
1. 000
1. 000
0.998
0.994
0.987
0.953
1. 000
1. 000
0.999
0.998
0.994
0.963
1. 000
1. 000
1. 000
0.999
0.997
0.969
0.98 1
0.98 2
0.98 4
0.98
6
0.98
8
0.98 10
19.7
18.7
16.7
14.7
12.7
10.7
0.000
0.000
0.000
0.000
0.001
0.120
0.980
0.951
0.897
0.857
0.825
0.701
0.991
0.974
0.934
0.898
0.868
0.741
0.996
0.987
0.959
0.930
0.903
0.774
0.998
0.994
0.976
0.954
0.931
0.801
0.999
0.997
0.986
0.970
0.952
0.822
1. 000
0.999
0.992
0.982
0.967
0.839
0.99 1
0.99 2
0.99 4
0.99 6
0.99 8
0.99 10
18.9
17.9
15.9
13.9
11. 9
9.9
0.000
0.000
0.000
0.000
0.011
0.363
0.849
0.797
0.737
0.703
0.672
0.414
0.884
0.835
0.775
0.738
0.704
0.434
0.911
0.867
0.808
0.770
0.735
0.453
0.933
0.894
0.838
0.800
0.763
0.471
0.949
0.916
0.865
0.827
0.790
0.489
0.962
0.935
0.888
0.852
0.815
0.505
00 1
00 2
00 4
00 6
00 8
00 10
18.0
17.0
15.0
13.0
11. 0
9.0
0.000
0.000
0.000
0.000
0.062
0.702
0.500
0.500
0.500
0.500
0.535
0.860
0.500
0.500
0.500
0.500
0.535
0.860
0.500
0.500
0.500
0.500
0.535
0.860
0.500
0.500
0.500
0.500
0.535
0.860
0.500
0.500
0.500
0.500
0.535
0.860
0.500
0.500
0.500
0.500
0.535
0.860
1. 01 1
1. 01 2
1. 01 4
1. 01
6
1. 01 8
1. 01 10
17.2
16.2
14.2
12.2
10.2
8.2
0.000
0.000
0.000
0.004
0.228
0.932
0.853
0.799
0.738
0.705
0.762
0.980
0.888
0.837
0.776
0.740
0.787
0.982
0.915
0.870
0.810
0.773
0.810
0.984
0.936
0.897
0.840
0.803
0.832
0.985
0.952
0.919
0.867
0.830
0.852
0.987
0.965
0.937
0.890
0.854
0.871
0.989
1. 02 1
1. 02 2
1. 02 4
1. 02
6
1. 02 8
1. 02 10
16.4
15.4
13.4
11. 4
9.4
7.4
0.000
0.000
0.000
0.029
0.542
0.994
0.983
0.954
0.899
0.864
0.929
0.999
0.993
0.976
0.936
0.904
0.947
0.999
0.997
0.988
0.961
0.935
0.962
1.000
0.999
0.995
0.977
0.957
0.973
1.000
1. 000
0.998
0.987
0.973
0.982
1. 000
1. 000
0.999
0.993
0.983
0.988
1. 000
1. 03 1
1. 03 2
1. 03 4
1. 03 6
1. 03 8
1. 03 10
15.6
14.6
12.6
10.6
8.6
6.6
0.000
0.000
0.002
0.130
0.848
1. 000
0.999
0.994
0.973
0.956
0.990
1. 000
1. 000
0.999
0.989
0.978
0.995
1. 000
1. 000
1. 000
0.996
0.990
0.997
1. 000
1. 000
1. 000
0.999
0.996
0.999
1. 000
1. 000
1. 000
1. 000
0.998
0.999
1. 000
1. 000
1. 000
1. 000
0.999
1. 000
1. 000
1.
1.
1.
1.
1.
1.
aM = f R (R = 89,048) and correct decision are: reject for f
< 1 and accept for f 2: 1
41
Table 3 .1 (continued)
b. N = 120,000
fa
D
U
PIA
nl
=
1,000
n/N = 5%
6%
PeD
7%
8%
9%
10%
0.97 1%
0.97 2
0.97 4
0.97 6
0.97 8
0.97 10
27.0%
26.0
24.0
22.0
20.0
18.0
0.000
0.000
0.000
0.000
0.001
0.040
0.998
0.989
0.963
0.935
0.909
0.850
0.999
0.996
0.983
0.964
0.945
0.889
1. 000
0.999
0.993
0.982
0.968
0.916
1. 000
1. 000
0.997
0.991
0.982
0.934
1. 000
1. 000
0.999
0.996
0.991
0.946
1.000
1.000
1.000
0.998
0.995
0.952
0.98 1
0.98 2
0.98 4
0.98 6
0.98 8
0.98 10
26.3
25.3
23.3
21.3
19.3
17.3
0.000
0.000
0.000
0.000
0.003
0.123
0.970
0.938
0.884
0.844
0.811
0.688
0.985
0.964
0.922
0.886
0.854
0.727
0.993
0.980
0.949
0.919
0.890
0.760
0.996
0.989
0.968
0.944
0.918
0.788
0.998
0.994
0.980
0.962
0.941
0.810
0.999
0.997
0.988
0.975
0.958
0.828
0.99 1
0.99 2
0.99 4
0.99 6
0.99 8
0.99 10
25.5
24.5
22.5
20.5
18.5
16.5
0.000
0.000
0.000
0.000
0.016
0.293
0.828
0.780
0.725
0.693
0.660
0.455
0.862
0.817
0.761
0.727
0.691
0.476
0.889
0.848
0.793
0.758
0.720
0.496
0.912
0.875
0.822
0.786
0.748
0.515
0.930
0.898
0.848
0.813
0.773
0.534
0.944
0.917
0.871
0.837
0.797
0.551
1. 00 1
1. 00 2
1. 00 4
1. 00 6
1. 00 8
1. 00 10
24.8
23.8
21. 8
19.8
17.8
15.8
0.000
0.000
0.000
0.001
0.058
0.536
0.500
0.500
0.500
0.501
0.533
0.780
0.500
0.500
0.500
0.501
0.533
0.780
0.500
0.500
0.500
0.501
0.533
0.780
0.500
0.500
0.500
0.501
0.533
0.780
0.500
0.500
0.500
0.501
0.533
0.780
0.500
0.500
0.500
0.501
0.533
0.780
1. 01 1
1. 01 2
1. 01 4
1. 01 6
1. 01 8
1. 01 10
24.1
23.1
21.1
19.1
17 .1
15.1
0.000
0.000
0.000
0.005
0.165
0.774
0.830
0.781
0.726
0.696
0.735
0.931
0.864
0.818
0.762
0.729
0.760
0.937
0.892
0.850
0.794
0.760
0.785
0.943
0.914
0.877
0.823
0.789
0.807
0.948
0.932
0.899
0.849
0.815
0.829
0.954
0.946
0.918
0.872
0.839
0.848
0.959
1. 02
1
1. 02 2
1. 02 4
1. 02
6
1. 02 8
1. 02 10
23.3
22.3
20.3
18.3
16.3
14.3
0.000
0.000
0.000
0.024
0.362
0.925
0.972
0.940
0.885
0.850
0.891
0.987
0.986
0.966
0.923
0.891
0.917
0.990
0.994
0.981
0.950
0.923
0.938
0.992
0.997
0.990
0.969
0.947
0.955
0.994
0.999
0.995
0.981
0.965
0.968
0.996
0.999
0.997
0.989
0.977
0.978
0.997
1. 03 1
1. 03 2
1. 03 4
1. 03 6
1. 03 8
1. 03 10
22.6
21. 6
19.6
17.6
15.6
13.6
0.000
0.000
0.002
0.082
0.614
0.985
0.998
0.990
0.965
0.944
0.971
0.999
1. 000
0.997
0.984
0.970
0.983
0.999
1. 000
0.999
0.993
0.985
0.990
1. 000
1.000
1. 000
0.997
0.993
0.995
1. 000
1. 000
1. 000
0.999
0.997
0.998
1.000
1. 000
1. 000
1. 000
0.999
0.999
1. 000
aM = f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f 2 1
42
Table 3 .1 (continued)
c. N = 130,000
fa
D
U
P 1A
nl =
1, 000
PeD
n/N = 5%
6%
7%
8%
9%
10%
0.97 1%
0.97 2
0.97 4
0.97 6
0.97 8
0.97 10
32.6%
31. 6
29.6
27.6
25.6
23.6
0.000
0.000
0.000
0.000
0.001
0.051
0.996
0.985
0.956
0.926
0.899
0.830
0.999
0.994
0.978
0.958
0.936
0.869
1. 000
0.998
0.990
0.977
0.961
0.898
1. 000
0.999
0.996
0.988
0.977
0.917
1. 000
1. 000
0.998
0.994
0.987
0.930
1. 000
1. 000
0.999
0.997
0.993
0.938
0.98 1
0.98 2
0.98 4
0.98 6
0.98 8
0.98 10
31. 9
30.9
28.9
26.9
24.9
22.9
0.000
0.000
0.000
0.000
0.005
0.129
0.961
0.927
0.872
0.833
0.799
0.675
0.978
0.955
0.911
0.875
0.842
0.713
0.988
0.973
0.939
0.909
0.878
0.745
0.994
0.984
0.960
0.935
0.907
0.773
0.997
0.991
0.974
0.954
0.930
0.796
0.998
0.995
0.984
0.969
0.948
0.815
0.99 1
0.99 2
0.99 4
0.99 6
0.99 8
0.99 10
31. 2
30.2
28.2
26.2
24.2
22.2
0.000
0.000
0.000
0.000
0.019
0.268
0.811
0.767
0.715
0.685
0.651
0.466
0.845
0.802
0.750
0.717
0.681
0.487
0.872
0.833
0.781
0.747
0.709
0.508
0.895
0.860
0.809
0.775
0.735
0.527
0.914
0.883
0.835
0.801
0.760
0.545
0.930
0.902
0.858
0.825
0.783
0.563
1. 00 1
1. 00 2
1. 00 4
1. 00 6
1. 00 8
1. 00 10
30.5
29.5
27.5
25.5
23.5
21. 5
0.000
0.000
0.000
0.001
0.056
0.461
0.500
0.500
0.500
0.501
0.532
0.743
0.500
0.500
0.500
0.501
0.532
0.743
0.500
0.500
0.500
0.501
0.532
0.743
0.500
0.500
0.500
0.501
0.532
0.743
0.500
0.500
0.500
0.501
0.532
0.743
0.500
0.500
0.500
0.501
0.532
0.743
1. 01 1
1. 01 2
1. 01 4
1. 01 6
1. 01 8
1. 01 10
29.8
28.8
26.8
24.8
22.8
20.8
0.000
0.000
0.000
0.006
0.138
0.669
0.813
0.768
0.716
0.688
0.719
0.895
0.846
0.803
0.750
0.720
0.745
0.904
0.874
0.834
0.782
0.750
0.769
0.912
0.897
0.861
0.810
0.778
0.792
0.920
0.916
0.884
0.836
0.803
0.813
0.928
0.931
0.903
0.859
0.827
0.833
0.935
1. 02
1. 02
1
2
4
6
8
10
29.1
28.1
26.1
24.1
22.1
20.1
0.000
0.000
0.000
0.020
0.282
0.838
0.963
0.929
0.873
0.838
0.868
0.970
0.980
0.957
0.912
0.879
0.898
0.976
0.989
0.974
0.940
0.912
0.922
0.981
0.994
0.985
0.961
0.938
0.942
0.986
0.997
0.992
0.975
0.957
0.958
0.989
0.999
0.995
0.984
0.971
0.970
0.992
1. 03 1
1. 03 2
1. 03 4
1. 03 6
1. 03 8
1. 03 10
28.4
27.4
25.4
23.4
21.4
19.4
0.000
0.000
0.002
0.060
0.478
0.939
0.996
0.986
0.957
0.933
0.955
0.994
0.999
0.995
0.979
0.962
0.972
0.996
1. 000
0.998
0.990
0.980
0.984
0.998
1. 000
0.999
0.996
0.990
0.991
0.999
1. 000
1.000
0.998
0.995
0.995
0.999
1. 000
1. 000
0.999
0.998
0.998
1. 000
1. 02
1. 02
1. 02
1. 02
aM =
f R (R =
89,048) and correct decision are: reject for f < 1 and accept for f 2: 1
43
higher duplication rates are favorable to the petitioner, and for fixed M
> R higher
duplication rates tend to be unfavorable as the duplication rate D approaches 8%.
3.4.3.2 Multiple Duplication of Valid Signatures
To calculate the probability of correct decision we constructed petitions where the
number of electors with i valid signatures in the petition,
Fi"
is given as an integer
approximation to the product of i~1 and the logarithmic series probability
Fi
~
ei - 1
- D----=---(i - 1)2In(1 - e)
fori = 2,3,4,5,6 and
e=
0+,0.05,0.10,0.20
6
so that number of duplicates D = ~ (i - 1) Fi . This model was found to be in rough
i=2
agreement with four fully-verified Washington State petitions. For this model higher
order replication, FdF2 for i = 3, ... ,6, increases with e. Similar to single duplication of
valid signatures, the number of distinct valid signatures for each petition was obtained as
M = f R, for f = 0.99, 1.00, and 1.01. To interpret the effect of multiple duplication on
PCD,
the bias and standard deviation of the estimator £1 are also calculated. Note that the
upper limit b in Equation (11) can be written as
b=
(R - M) - bias(M)
........ SD(M)
Table 3.2 displays the probabilities of acceptance from the first sample (PtA) and of
correct decision (PCD ), the correlation coefficient between the estimator of the number of
invalid signatures from the first sample (U 1) and the estimator of the number of distinct
valid signatures (£1), the bias and standard deviation of the estimator £1, and the
bivariate normal integration limits a and b in Equation (9) for single (e = 0+) and
multiple (e > 0) duplication. Tables 3.2 corresponds to the petition size N = 110,000
44
and 130,000 and combined sample size n = 5 and 10 percent of N. The 48 petitions in
each table correspond to
f = 0.99,
1.00, and 1.01; D
= 0.01,0.02,0.04 and 0.08, and
() = 0+,0.05,0.10, and 0.20. Also included are the Fi , for i = 2, ... ,6 and the number of
6
electors who contribute one or more duplicates (F2+ = EFi ) corresponding to fixed
i=2
values for D and (). Note that when () = 0+, valid signatures are duplicated at most once,
For increasing () and fixed D and f, the number of electors contributing duplicates
(F2+) decreases, the correlation (p) between f) I and M is approximately constant, the
bias of M becomes more negative and the standard deviation of M increases slightly
compared to the bias. As a result, the upper limit b increases corresponding to increasing
PCD for f < 1 and to decreasing PCD for f 2: 1.
For increasing D and fixed
f
and (), the correlation (p) between f) I and M increases,
the bias of M (except for () = 0+) decreases, and the standard deviation of M increases.
The corresponding upper limit b decreases for
f <
1 and increases for
f 2:
1. Also note
that the lower limit a decreases with U (increases with D), so that the probability of
accepting from the first sample (PtA) decreases with U.
For f
=
1 (M = R) and ()
> O+, the PCD is smaller than 0.5 except for D = 8%,
and ()
= 0.05 in Table 3.2.a with N =
N
130,000 and
=
except for
f =
110,000 and
n/ N = 5%, and Table 3.2.c with
n/ N = 5%. For fixed () > 0+ and f
1.01, D
=
8%, in Table 3.2.a with N
=
i=- 1, the PCD decreases with D
110,000 and n/ N
=
5%.
45
Table 3.2 Probabilities of acceptance from the first sample (PIA) and of correct
decision (PeD ) with the correlation coefficient for VI and
M, bias and standard
deviation of the estimator M, and the bivariate normal integration limits a and b in
Equation (9) for single (() = 0+) and multiple (() > 0) duplication.
a.N= 110,000 and n/N = 5%
1"
u
75
()
F2
F3
F4 F5 F6
~
M
b
a
P 'A
p
Bias (At)
M
----v-
b
PCD
SDIAf)
0.99
0.99
0.99
0.99
18.9%
18.9
18.9
18.9
1%
1
1
1
0+
0.05
0.10
0.20
1100
1072
1045
986
0
14
26
49
0
0
1
4
0
0
0
1
0
0
0
0
1. 25%
1. 23
1. 22
1.18
-7.58
-7.58
-7.58
-7.58
0.000
0.000
0.000
0.000
-0.27 0.00%
-0.27 -0.02
-0.27 -0.03
-0.26 -0.07
0.98%
0.98
0.98
0.99
1. 03
1. 04
1. 06
1. 09
0.849
0.852
0.855
0.862
0.99
0.99
0.99
0.99
17.9
17.9
17.9
17.9
2
2
2
2
0+
0.05
0.10
0.20
2200
2145
2088
1971
0
26
53
99
0
1
2
9
0
0
0
1
0
0
0
0
2.50
2.46
2.43
2.36
-6.91
-6.91
-6.91
-6.91
0.000
0.000
0.000
0.000
-0.20 0.00
-0.20 -0.03
-0.20 -0.06
-0.20 -0.14
1. 22
1. 22
1. 23
1. 24
0.83
0.85
0.87
0.92
0.797
0.803
0.809
0.822
0.99
0.99
0.99
0.99
15.9
15.9
15.9
15.9
4
4
4
4
0+
0.05
0.10
0.20
4400
4289
4177
3944
0
54
104
197
0
1
5
18
0
0
0
2
0
0
0
0
4.99
4.93
4.86
4.72
-5.51
-5.51
-5.51
-5.51
0.000
0.000
0.000
0.000
-0.14 0.00
-0.14 -0.06
-0.14 -0.13
-0.14 -0.28
1. 59
1. 60
1. 61
1. 63
0.63
0.67
0.71
0.79
0.737
0.748
0.760
0.785
0.99
0.99
0.99
0.99
11. 9
11. 9
11.9
11. 9
8
8
8
8
0+
0.05
0.10
0.20
8800
8578
8351
7886
0
108
209
394
0
2
9
35
0
0
1
4
0
0
0
1
9.98
9.86
9.72
9.44
-2.29
-2.29
-2.29
-2.29
0.011
0.011
0.011
0.011
-0.08 0.00
-0.08 -0.12
-0.08 -0.26
-0.08 -0.57
2.15
2.16
2.18
2.21
0.47
0.52
0.58
0.7l
0.672
0.691
0.711
0.753
1. 00
1. 00
1. 00
1. 00
18.0
18.0
18.0
18.0
1
1
1
1
0+
0.05
0.10
0.20
1100
1072
1045
986
0
14
26
49
0
0
1
4
0
0
0
1
0
0
0
0
1. 24
1. 22
1. 20
1.l7
-7.04
-7.04
-7.04
-7.04
0.000
0.000
0.000
0.000
-0.26 0.00
-0.26 -0.01
-0.26 -0.03
-0.26 -0.07
0.96
0.96
0.97
0.98
0.00
0.02
0.03
0.07
0.500
0.494
0.487
0.47l
1. 00
1. 00
1. 00
1. 00
17.0
17.0
17.0
l7.0
2
2
2
2
0+
0.05
0.10
0.20
2200
2145
2088
1971
0
26
53
99
0
1
2
9
0
0
0
1
0
0
0
0
2.47
2.44
2.41
2.34
-6.35
-6.35
-6.35
-6.35
0.000
0.000
0.000
0.000
-0.20 0.00
-0.20 -0.03
-0.20 -0.06
-0.20 -0.14
1. 20
1. 21
1. 21
1. 23
0.00
0.03
0.05
0.11
0.500
0.490
0.479
0.455
1. 00
1. 00
1. 00
1. 00
15.0
15.0
15.0
15.0
4
4
4
4
0+
0.05
0.10
0.20
4400
4289
4177
3944
0
54
104
197
0
1
5
18
0
0
0
2
0
0
0
0
4.94
4.88
4.81
4.67
-4.91
-4.91
-4.91
-4.91
0.000
0.000
0.000
0.000
-0.14 0.00
-0.14 -0.06
-0.14 -0.13
-0.14 -0.28
1. 57
1. 58
1. 59
1. 61
0.00
0.04
0.08
o.l7
0.500
0.485
0.468
0.431
1. 00
1. 00
1. 00
1. 00
11. 0
11.0
11.0
11.0
8
8
8
8
0+
0.05
0.10
0.20
8800
8578
8351
7886
0
108
209
394
0
2
9
35
0
0
1
4
0
0
0
1
9.88
9.76
9.62
9.34
-1. 54
-1. 54
-1. 54
-1. 54
0.062
0.062
0.062
0.062
-0.08 0.00
-0.08 -0.12
-0.08 -0.26
-0.08 -0.57
2.l3
2.14
2.16
2.19
0.00
0.06
0.12
0.26
0.535
0.5l3
0.490
0.439
1. 01
1. 01
1. 01
1. 01
17.2
17.2
l7 .2
l7 .2
1
1
1
1
0+
0.05
0.10
0.20
1100
1072
1045
986
0
14
26
49
0
0
1
4
0
0
0
1
0
0
0
0
1. 22
1. 21
1.19
1.16
-6.49
-6.49
-6.49
-6.49
0.000
0.000
0.000
0.000
-0.26 0.00
-0.26 -0.01
-0.26 -0.03
-0.26 -0.07
0.94
0.95
0.95
0.96
-1. 05
-1. 03
-1. 01
-0.96
0.853
0.848
0.843
0.830
1. 01
1. 01
1. 01
1. 01
16.2
16.2
16.2
16.2
2
2
2
2
0+
0.05
0.10
0.20
2200
2145
2088
1971
0
26
53
99
0
1
2
9
0
0
0
1
0
0
0
0
2.45
2.41
2.38
2.31
-5.78
-5.78
-5.78
-5.78
0.000
0.000
0.000
0.000
-0.20 0.00
-0.20 -0.03
-0.20 -0.06
-0.19 -0.14
1.18
1.19
1. 20
1. 21
-0.84
-0.81
-0.78
-0.70
0.799
0.790
0.781
0.759
1. 01
1. 01
1. 01
1. 01
14.2
14.2
14.2
14.2
4
4
4
4
0+
0.05
0.10
0.20
4400
4289
4177
3944
0
54
104
197
0
1
5
18
0
0
0
2
0
0
0
0
4.89
4.83
4.77
4.63
-4.28
-4.28
-4.28
-4.28
0.000
0.000
0.000
0.000
-0.14 0.00
-0.14 -0.06
-0.13 -0.13
-0.13 -0.28
1. 55
1. 56
1. 57
1. 59
-0.64
-0.60
-0.55
-0.45
0.738
0.724
0.709
0.673
1. 01
1. 01
1. 01
1. 01
10.2
10.2
10.2
10.2
8
8
8
8
0+
0.05
0.10
0.20
8800
8578
8351
7886
0
108
209
394
0
2
9
35
0
0
1
4
0
0
0
1
9.78
9.66
9.53
9.25
-0.75
-0.75
-0.75
-0.75
0.228
0.228
0.228
0.228
-0.08 0.00
-0.08 -0.12
-0.08 -0.25
-0.08 -0.56
2.10
2.12
2.l3
2.17
-0.47
-0.41
-0.34
-0.20
0.762
0.746
0.727
0.684
f < 1 and accept for f :::: 1
= F2 + H + F4 + F5 + F6 is the number of the !'vI electors who contribute one or more duplicates
aM = f R (R = 89,048) and correct decision are: reject for
b F2+
46
Table 3.2 (continued)
b.N= 110,000 and n/ N = 10%
TJ
fa
75
()
F2
F3
F4 Fs F6
F, b
M
a
p
P'A
Bias (p,])
SD(M)
M
AI
b
PeD
0.99
0.99
0.99
0.99
18.9%
18.9
18.9
18.9
1%
1
1
1
0+
0.05
0.10
0.20
1100
1072
1045
986
0
14
26
49
0
0
1
4
0
0
0
1
0
0
0
0
1. 25%
1. 23
1. 22
1.18
-7.58
-7.58
-7.58
-7.58
0.000
0.000
0.000
0.000
-0.22 0.00%
-0.22 -0.01
-0.22 -0.03
-0.21 -0.07
0.57%
0.57
0.57
0.58
1. 77
1. 79
1. 81
1. 87
0.962
0.963
0.965
0.969
0.99
0.99
0.99
0.99
l7.9
17.9
17.9
17.9
2
2
2
2
0+
0.05
0.10
0.20
2200
2145
2088
1971
0
26
53
99
0
1
2
9
0
0
0
1
0
0
0
0
2.50
2.46
2.43
2.36
-6.91
-6.91
-6.91
-6.91
0.000
0.000
0.000
0.000
-0.18 0.00
-0.18 -0.03
-0.18 -0.06
-0.17 -0.13
0.67
0.67
0.67
0.68
1. 51
1. 55
1. 59
1. 68
0.935
0.939
0.944
0.953
0.99
0.99
0.99
0.99
15.9
15.9
15.9
15.9
4
4
4
4
0+
0.05
0.10
0.20
4400
4289
4177
3944
0
54
104
197
0
1
5
18
0
0
0
2
0
0
0
0
4.99
4.93
4.86
4.72
-5.51
-5.51
-5.51
-5.51
0.000
0.000
0.000
0.000
-0.13 0.00
-0.13 -0.06
-0.13 -0.12
-0.13 -0.27
0.83
0.83
0.84
0.85
1. 22
1. 28
1. 35
1. 50
0.888
0.900
0.9ll
0.933
0.99
0.99
0.99
0.99
1l.9
11.9
11. 9
11. 9
8
8
8
8
0+
0.05
0.10
0.20
8800
8578
8351
7886
0
108
209
394
0
2
9
35
0
0
1
4
0
0
0
1
9.98
9.86
9.72
9.44
-2.29
-2.29
-2.29
-2.29
0.01l
0.011
0.011
0.01l
-0.08 0.00
-0.08 -0.12
-0.08 -0.25
-0.08 -0.54
1. 08
1. 09
1.10
loll
0.93
1. 03
1.14
1. 39
0.815
0.839
0.864
0.907
1. 00
1. 00
1. 00
1. 00
18.0
18.0
18.0
18.0
1
1
1
1
0+
0.05
0.10
0.20
1100
1072
1045
986
0
14
26
49
0
0
1
4
0
0
0
1
0
0
0
0
1. 24
1. 22
1. 20
1.17
-7.04
-7.04
-7.04
-7.04
0.000
0.000
0.000
0.000
-0.22 0.00
-0.21 -0.01
-0.21 -0.03
-0.21 -0.07
0.56
0.56
0.56
0.57
0.00
0.03
0.05
0.12
0.500
0.490
0.479
0.453
1. 00
1. 00
1. 00
1. 00
17.0
17.0
17.0
17.0
2
2
2
2
0+
0.05
0.10
0.20
2200
2145
2088
1971
0
26
53
99
0
1
2
9
0
0
0
1
0
0
0
0
2.47
2.44
2.41
2.34
-6.35
-6.35
-6.35
-6.35
0.000
0.000
0.000
0.000
-0.17 0.00
-0.17 -0.03
-0.l7 -0.06
-0.l7 -0.13
0.66
0.66
0.66
0.67
0.00
0.04
0.09
0.20
0.500
0.482
0.464
0.422
1. 00
1. 00
1. 00
1. 00
15.0
15.0
15.0
15.0
4
4
4
4
0+
0.05
0.10
0.20
4400
4289
4177
3944
0
54
104
197
0
1
5
18
0
0
0
2
0
0
0
0
4.94
4.88
4.81
4.67
-4.91
-4.91
-4.91
-4.91
0.000
0.000
0.000
0.000
-0.13 0.00
-0.13 -0.06
-0.13 -0.12
-0.12 -0.26
0.82
0.82
0.83
0.84
0.00
0.07
0.14
0.31
0.500
0.472
0.442
0.377
1. 00
1. 00
1. 00
1. 00
11.0
11.0
11. 0
11.0
8
8
8
8
0+
0.05
0.10
0.20
8800
8578
8351
7886
0
108
209
394
0
2
9
35
0
0
1
4
0
0
0
1
9.88
9.76
9.62
9.34
-1. 54
-1. 54
-1. 54
-1. 54
0.062
0.062
0.062
0.062
-0.08 0.00
-0.08 -0.12
-0.08 -0.24
-0.07 -0.53
1. 07
1. 08
1. 08
1.10
0.00
O.ll
0.22
0.48
0.535
0.495
0.451
0.359
1. 01
1. 01
1. 01
1. 01
17.2
17.2
17.2
17.2
1
1
1
1
0+
0.05
0.10
0.20
1100
1072
1045
986
0
14
26
49
0
0
1
4
0
0
0
1
0
0
0
0
1. 22
1. 21
1.19
1.16
-6.49
-6.49
-6.49
-6.49
0.000
0.000
0.000
0.000
-0.21 0.00
-0.21 -0.01
-0.21 -0.03
-0.21 -0.07
0.55
0.55
0.55
0.56
-1. 81
-1. 78
-1. 74
-1. 66
0.965
0.962
0.959
0.952
1. 01
1. 01
1. 01
1. 01
16.2
16.2
16.2
16.2
2
2
2
2
0+
0.05
0.10
0.20
2200
2145
2088
1971
0
26
53
99
0
1
2
9
0
0
0
1
0
0
0
0
2.45
2.41
2.38
2.31
-5.78
-5.78
-5.78
-5.78
0.000
0.000
0.000
0.000
-0.17 0.00
-0.17 -0.03
-0.17 -0.06
-0.17 -0.13
0.65
0.65
0.65
0.66
-1. 53
-1. 48
-1. 43
-1. 30
0.937
0.931
0.923
0.904
1. 01
1. 01
1. 01
1. 01
14.2
14.2
14.2
14.2
4
4
4
4
0+
0.05
0.10
0.20
4400
4289
4177
3944
0
54
104
197
0
1
5
18
0
0
0
2
0
0
0
0
4.89
4.83
4.77
4.63
-4.28
-4.28
-4.28
-4.28
0.000
0.000
0.000
0.000
-0.12 0.00
-0.12 -0.06
-0.12 -0.12
-0.12 -0.26
0.81
0.81
0.82
0.83
-1. 23
-1.15
-1. 07
-0.88
0.890
0.875
0.857
0.8ll
1. 01
1. 01
1. 01
1. 01
10.2
10.2
10.2
10.2
8
8
8
8
0+
0.05
0.10
0.20
8800
8578
8351
7886
0
108
209
394
0
2
9
35
0
0
1
4
0
0
0
1
9.78
9.66
9.53
9.25
-0.75
-0.75
-0.75
-0.75
0.228
0.228
0.228
0.228
-0.07 0.00
-0.07 -0.11
-0.07 -0.24
-0.07 -0.53
1. 06
1. 06
1. 07
1. 09
-0.94
-0.82
-0.70
-0.43
0.87l
0.848
0.820
0.749
=
f R (R
aM
b F2+ =
F2
=
89,048) and correct decision are: reject for f
+ F3 + F4 + Fs + F6
< 1 and accept for f
~
1
is the number of the M electors who contribute one or more duplicates
47
Table 3.2 (continued)
c.N= 130,000 and n/N = 5%
f"
TJ
I5
(j
F2
F3
1% 0+
F, F5 F6
~
b
M
a
P'A
p
Bias (ii)
-A-1-
SD
(i11)
M
b
PCD
0.99
0.99
0.99
0.99
31. 2%
31.2
31. 2
31.2
1
1
1
0.05
0.10
0.20
1300
1268
1235
1165
0
16
31
58
0
0
1
5
0
0
0
1
0
0
0
0
1. 47%
1. 46
1. 44
1. 39
-6.72
-6.72
-6.72
-6.72
0.000
0.000
0.000
0.000
-0.27 0.00%
-0.27 -0.02
-0.27 -0.04
-0.26 -0.08
1.14%
1.15
1.15
1.16
0.88
0.89
0.91
0.94
0.811
0.815
0.818
0.827
0.99
0.99
0.99
0.99
30.2
30.2
30.2
30.2
2
2
2
2
0+
0.05
0.10
0.20
2600
2533
2467
2330
0
32
62
118
0
1
3
10
0
0
0
1
0
0
0
0
2.95
2.91
2.87
2.79
-6.09
-6.09
-6.09
-6.09
0.000
0.000
0.000
0.000
-0.21 0.00
-0.21 -0.04
-0.21 -0.08
-0.21 -0.17
1. 39
1. 39
1. 40
1. 41
0.73
0.75
0.78
0.83
0.767
0.774
0.781
0.797
0.99
0.99
0.99
0.99
28.2
28.2
28.2
28.2
4
4
4
4
0+
0.05
0.10
0.20
5200
5069
4935
4661
0
64
123
234
0
1
5
21
0
0
1
2
0
0
0
0
5.90
5.82
5.74
5.58
-4.81
-4.81
-4.81
-4.81
0.000
0.000
0.000
0.000
-0.15 0.00
-0.15 -0.07
-0.15 -0.15
-0.15 -0.33
1. 77
1. 78
1. 79
1. 82
0.57
0.61
0.65
0.74
0.7l5
0.728
0.742
0.770
0.99
0.99
0.99
0.99
24.2
24.2
24.2
24.2
8
8
8
8
0+
0.05
0.10
0.20
10400
10137
987l
9320
0
127
246
466
0
3
11
41
0
0
1
5
0
0
0
1
11.80
11. 65
11. 49
11.15
-2.08
-2.08
-2.08
-2.08
0.019
0.019
0.019
0.019
-0.10 0.00
-0.10 -0.15
-0.10 -0.31
-0.09 -0.67
2.37
2.38
2.40
2.43
0.43
0.49
0.55
0.69
0.651
0.672
0.694
0.740
1. 00
1. 00
1. 00
1. 00
30.5
30.5
30.5
30.5
1
1
1
1
0+
0.05
0.10
0.20
1300
1268
1235
1165
0
16
31
58
0
0
1
5
0
0
0
1
0
0
0
0
1. 46
1. 44
1. 42
1. 38
-6.29
-6.29
-6.29
-6.29
0.000
0.000
0.000
0.000
-0.27 0.00
-0.27 -0.02
-0.27 -0.04
-0.26 -0.08
1.13
1.13
1.14
1.15
0.00
0.02
0.03
0.07
0.500
0.494
0.487
0.471
1. 00
1. 00
1. 00
1. 00
29.5
29.5
29.5
29.5
2
2
2
2
0+
0.05
0.10
0.20
2600
2533
2467
2330
0
32
62
118
0
1
3
10
0
0
0
1
0
0
0
0
2.92
2.88
2.84
2.76
-5.66
-5.66
-5.66
-5.66
0.000
0.000
0.000
0.000
-0.21 0.00
-0.21 -0.04
-0.21 -0.08
-0.21 -0.16
1. 37
1. 38
1. 38
1. 40
0.00
0.03
0.05
0.12
0.500
0.489
0.478
0.453
1. 00
1. 00
1. 00
1. 00
27.5
27.5
27.5
27.5
4
4
4
4
0+
0.05
0.10
0.20
5200
5069
4935
4661
0
64
123
234
0
1
5
21
0
0
1
2
0
0
0
0
5.84
5.77
5.69
5.52
-4.35
-4.35
-4.35
-4.35
0.000
0.000
0.000
0.000
-0.15 0.00
-0.15 -0.07
-0.15 -0.15
-0.15 -0.33
1. 76
1. 76
1. 78
1. 80
0.00
0.04
0.09
0.18
0.500
0.484
0.466
0.428
1. 00
1. 00
1. 00
1. 00
23.5
23.5
23.5
23.5
8
8
8
8
0+
0.05
0.10
0.20
10400
10137
987l
9320
0
127
246
466
0
3
11
41
0
0
1
5
0
0
0
1
11.68
11. 53
11. 37
11. 04
-1. 59
-1. 59
-1. 59
-1. 59
0.056
0.056
0.056
0.056
-0.10 0.00
-0.10 -0.14
-0.10 -0.30
-0.09 -0.67
2.34
2.36
2.37
2.41
0.00
0.06
0.13
0.28
0.532
0.509
0.484
0.429
1. 01
1. 01
1. 01
1. 01
29.8
29.8
29.8
29.8
1
1
1
1
0+
0.05
0.10
0.20
1300
1268
1235
1165
0
16
31
58
0
0
1
5
0
0
0
1
0
0
0
0
1. 45
1. 43
1. 41
1. 37
-5.86
-5.86
-5.86
-5.86
0.000
0.000
0.000
0.000
-0.27 0.00
-0.27 -0.02
-0.27 -0.04
-0.26 -0.08
1.11
1.12
1.12
1.13
-0.89
-0.87
-0.85
-0.80
0.813
0.808
0.802
0.788
1. 01
1. 01
1. 01
1. 01
28.8
28.8
28.8
28.8
2
2
2
2
0+
0.05
0.10
0.20
2600
2533
2467
2330
0
32
62
118
0
1
3
10
0
0
0
1
a
0
0
0
2.89
2.85
2.82
2.73
-5.22
-5.22
-5.22
-5.22
0.000
0.000
0.000
0.000
-0.21 0.00
-0.21 -0.04
-0.21 -0.07
-0.21 -0.16
1. 35
1. 36
1. 37
1. 38
-0.73
-0.70
-0.67
-0.60
0.768
0.758
0.748
0.725
1. 01
1. 01
1. 01
1. 01
26.8
26.8
26.8
26.8
4
4
4
4
0+
0.05
0.10
0.20
5200
5069
4935
4661
0
64
123
234
0
1
5
21
0
0
1
2
0
0
0
0
5.78
5.71
5.63
5.47
-3.90
-3.90
-3.90
-3.90
0.000
0.000
0.000
0.000
-0.15 0.00
-0.15 -0.07
-0.15 -0.15
-0.15 -0.32
1. 74
1. 75
1. 76
1. 78
-0.57
-0.53
-0.48
-0.37
0.7l6
0.701
0.683
0.646
1. 01
1. 01
1. 01
1. 01
22.8
22.8
22.8
22.8
8
8
8
8
0+
0.05
0.10
0.20
10400
10137
9871
9320
0
127
246
466
0
3
11
41
0
0
1
5
0
0
0
1
11. 56
11. 42
11.26
10.93
-1. 09
-1. 09
-1. 09
-1. 09
0.138
0.138
0.138
0.138
-0.10 0.00
-0.10 -0.14
-0.09 -0.30
-0.09 -0.66
2.32
2.33
2.35
2.38
-0.43
-0.36
-0.29
-0.14
0.7l9
0.699
0.677
0.624
f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f :2: 1
= F2 + F3 + F4 + Fs + F6 is the number of the M electors who contribute one or more duplicates
aM =
b F2+
48
Table 3.2 (continued)
d.N= 130,000 and n/ N = 10%
fa
I5 e
V
F2
F3 F4 F5
F6
~
M
b
a
p
PIA
Bias (lVi)
-A-[-
SD
(1~1)
-----x;r-
b
PeD
0.99
0.99
0.99
0.99
31. 2%
31.2
31.2
31.2
1%
1
1
1
0+
0.05
0.10
0.20
l300
1268
1235
ll65
0
16
31
58
0
0
1
5
0
0
0
1
0
0
0
0
1. 47%
1. 46
1. 44
1. 39
-6.72
-6.72
-6.72
-6.72
0.000
0.000
0.000
0.000
-0.21 0.00%
-0.21 -0.02
-0.21 -0.03
-0.21 -0.08
0.69%
0.69
0.69
0.69
1. 47
1. 49
1. 52
1. 57
0.930
0.932
0.935
0.942
0.99
0.99
0.99
0.99
30.2
30.2
30.2
30.2
2
2
2
2
0+
0.05
0.10
0.20
2600
2533
2467
2330
0
32
62
ll8
0
1
3
10
0
0
0
1
0
0
0
0
2.95
2.91
2.87
2.79
-6.09
-6.09
-6.09
-6.09
0.000
0.000
0.000
0.000
-0.18 0.00
-0.18 -0.04
-0.18 -0.07
-0.18 -0.16
0.78
0.78
0.79
0.79
1. 29
1. 33
1. 38
1. 47
0.902
0.909
0.916
0.929
0.99
0.99
0.99
0.99
28.2
28.2
28.2
28.2
4
4
4
4
0+
0.05
0.10
0.20
5200
5069
4935
4661
0
64
123
234
0
1
5
21
0
0
1
2
0
0
0
0
5.90
5.82
5.74
5.58
-4.81
-4.81
-4.81
-4.81
0.000
0.000
0.000
0.000
-0.14 0.00
-0.14 -0.07
-0.14 -0.15
-0.13 -0.31
0.94
0.95
0.95
0.96
1. 07
1.14
1. 21
1. 37
0.858
0.872
0.888
0.915
0.99
0.99
0.99
0.99
24.2
24.2
24.2
24.2
8
8
8
8
0+
0.05
0.10
0.20
10400
10137
9871
9320
0
127
246
466
0
3
41
0
0
1
5
0
0
0
1
ll.80
11. 65
11. 49
11.15
-2.08
-2.08
-2.08
-2.08
0.019
0.019
0.019
0.019
-0.09 0.00
-0.09 -0.14
-0.09 -0.29
-0.09 -0.64
1. 20
1. 21
1. 22
1. 23
0.84
0.95
1. 07
1. 33
0.783
0.812
0.840
0.891
1. 00
1. 00
1. 00
1. 00
30.5
30.5
30.5
30.5
1
1
1
1
0+
0.05
0.10
0.20
l300
1268
1235
ll65
0
16
31
58
0
0
1
5
0
0
0
1
0
0
0
0
1. 46
1. 44
1. 42
1. 38
-6.29
-6.29
-6.29
-6.29
0.000
0.000
0.000
0.000
-0.21 0.00
-0.21 -0.02
-0.21 -0.03
-0.21 -0.08
0.68
0.68
0.68
0.68
0.00
0.02
0.05
0.12
0.500
0.490
0.480
0.454
1. 00
1. 00
1. 00
1. 00
29.5
29.5
29.5
29.5
2
2
2
2
0+
0.05
0.10
0.20
2600
2533
2467
2330
0
32
62
ll8
0
1
3
10
0
0
0
1
0
0
0
0
2.92
2.88
2.84
2.76
-5.66
-5.66
-5.66
-5.66
0.000
0.000
0.000
0.000
-0.18 0.00
-0.18 -0.04
-0.18 -0.07
-0.18 -0.15
0.77
0.77
0.78
0.78
0.00
0.05
0.09
0.20
0.500
0.482
0.463
0.422
1. 00
1. 00
1. 00
1. 00
27.5
27.5
27.5
27.5
4
4
4
4
0+
0.05
0.10
0.20
5200
5069
4935
4661
0
64
123
234
0
1
5
21
0
0
1
2
0
0
0
0
5.84
5.77
5.69
5.52
-4.35
-4.35
-4.35
-4.35
0.000
0.000
0.000
0.000
-0.14 0.00
-0.14 -0.07
-0.14 -0.14
-0.l3 -0.31
0.93
0.94
0.94
0.95
0.00
0.07
0.15
0.33
0.500
0.471
0.439
0.373
1. 00
1. 00
1. 00
1. 00
23.5
23.5
23.5
23.5
8
8
8
8
0+
0.05
0.10
0.20
10400
10137
987l
9320
0
127
246
466
0
3
0
0
0
1
ll.68
ll.53
ll.37
11. 04
-1. 59
-1. 59
-1. 59
-1. 59
0.056
0.056
0.056
0.056
-0.09 0.00
-0.09 -0.14
-0.09 -0.29
-0.09 -0.63
1.19
1. 20
1. 20
1. 22
0.00
41
0
0
1
5
0.24
0.52
0.532
0.489
0.443
0.345
1. 01
l. 01
1. 01
l. 01
29.8
29.8
29.8
29.8
1
1
1
1
0+
0.05
0.10
0.20
l300
1268
1235
ll65
0
16
31
58
0
0
1
5
0
0
0
1
0
0
0
0
1. 45
1. 43
1. 41
l. 37
-5.86
-5.86
-5.86
-5.86
0.000
0.000
0.000
0.000
-0.21 0.00
-0.21 -0.02
-0.21 -0.03
-0.21 -0.08
0.67
0.67
0.67
0.67
-l. 49
-l. 46
-l. 43
-1. 35
0.931
0.928
0.923
0.912
l. 01
2
2
2
2
0+
0.05
0.10
0.20
2600
2533
2467
2330
0
32
62
ll8
0
1
3
10
0
0
2.89
2.85
2.82
2.73
-5.22
-5.22
-5.22
-5.22
0.000
0.000
0.000
0.000
-0.18 0.00
-0.18 -0.03
-0.18 -0.07
-0.18 -0.15
0.76
0.76
0.77
0.77
-l. 30
l. 01
1. 01
28.8
28.8
28.8
28.8
-1. 25
-1. 20
-1. 08
0.903
0.894
0.885
0.861
l. 01
1. 01
l. 01
l. 01
26.8
26.8
26.8
26.8
4
4
4
4
0+
0.05
0.10
0.20
5200
5069
4935
4661
0
64
123
234
0
1
5
21
-3.90
-3.90
-3.90
-3.90
0.000
0.000
0.000
0.000
-0.14 0.00
-0.14 -0.07
-0.13 -0.14
-0.13 -0.31
0.92
0.93
0.93
0.94
-1. 07
-l. 00
-0.91
-0.73
0.859
0.841
0.819
0.766
l. 01
l. 01
l. 01
1. 01
22.8
22.8
22.8
22.8
8
8
8
8
0+
0.05
0.10
0.20
10400
10137
9871
9320
0
127
246
466
-l. 09
0.l38
0.138
0.138
0.138
-0.09 0.00
-0.09 -0.14
-0.09 -0.28
-0.09 -0.62
l.18
l.18
1.19
l. 21
-0.84
-0.72
-0.59
-0.30
0.833
0.803
0.768
0.679
1. 01
aM
= f
b F2+ =
R (R
F2
=
II
II
a a
a 0
1
0
a
a a
1
2
0
0
5.78
5.71
5.63
5.47
0
3
II
0
0
1
41
5
0
0
0
1
ll.56
ll.42
ll.26
10.93
0
-1. 09
-l. 09
-1. 09
89,048) and correct decision are: reject for f < 1 and accept for f :2:: 1
+ F3 + F4 + F5 + F6
is the number of the M electors who contribute one or more duplicates
o.ll
49
3.5 Two Submissions of Signatures
We now consider the special case when a petition is not accepted after sampling from
the first submission, and a second submission of additional signatures is submitted for
verification before the due date. Stratified random sampling with proportional allocation
is used for the second submission. Only single duplication of valid signatures is
considered.
3.5.1 The Decision Rule
The decision rule (Oregon Administrative Rules Chapter 165, 1999) when the petition
consists of two submissions can be described as follows. Following the notation for
counts in each submission introduced in Section 3.3, we denote by nh, Uh, and dh the
numbers of signatures, invalid signatures, and duplicates of valid signatures, respectively,
in the sample from the hth submission, for h = F,S. Let d FS be the number of duplicates
of valid signatures between the samples from the first and second submissions.
Proportional allocation is used for determining the sample size for the second submission
(ns). Then, ns is approximately calculated as the product ofthe combined sample size
(nF) for the first submission and the ratio of the submission sizes
The estimator for the number of distinct valid signatures in the petition
M = MF
+ Ms - DFS ,
(18)
is obtained from separate estimates from each submission (h = F,S)
Mh = Nh -
fh - Dh,
(19)
50
where
and
and from the estimate of the number of duplicated valid signatures between the first and
second submissions
(20)
The estimate of £It is then used in the decision: accept the petition if £It 2': R and reject
if £It
< R.
3.5.2 Probability of Correct Decision
Extending the development for a single submission, the petition can only be rejected
if MIL
< R (or fJ I > k 1), £It F < R, and £It < R. Thus, the probability that the petition
will be rejected is
The probability of correct decision PeD follows Equation (8) with
M = MF
+ Ms - DFS.
Using a trivariate normal approximation for the joint sampling distribution of fJ 1,
£It F, and £It results in
(21 )
where ¢(Zl' Z2,
and
P23.
Z3)
is the standardized trivariate normal density with correlations P12,
P\3,
The correlation P12 and the standardized upper integration limits a and b were
introduced in the case of only one submission, Equations (10) and (11), without the sub
index F that denotes first submission. The correlations
51
P13
=
Cov(UI,M F) - Cov(UI,D FS )
SD(U.)SD(M)
,P23
=
Var(MF) - Cov(MF,D FS )
SD(MF)SD(M)
,(22)
and the standardized upper integration limit
c=
R - E(M)
SD(M)
(23)
must be determined. As noted in Section 3.4.2, M from a single submission has an
approximately normal distribution, and hence
MF and Ms do. Since duplicate signatures
are rare events, the random variable dFs can be regarded as having an approximately
Poisson distribution, which for large enough mean is approximately normal. Therefore,
we expect DFs, and hence also
M for two submissions, to have an approximately normal
distribution. It is also of interest to calculate the probability of accepting the petition from
the combined first and second samples drawn from the first submission,
P2A = 1 - P(UI > kl' MF < R).
3.5.3 Numerical Results
We calculate the probabilities of accepting the petition from the first sample (PIA)'
accepting from the combined first and second samples (P2A ), and making a correct
decision (PeD ) after the second submission is filed. The rejection probabilities, PR , in
Equation (21) were computed using the CDFTVN function of the GAUSS (1992)
software. We consider only the case when valid signatures are duplicated at most once,
D
= F2 •
For allocating U
= NU invalid signatures and D = N D duplicates of valid
signatures between the two submissions we use the expected number under random
allocation of signatures to the two submissions. This gives, h
= For S, Uh = NhU,
52
D
N (Nh-1)lJ.(N _l) D; DFS
. =
= N=
Dh wIth Dh =
Dh
=
DF + Ds
. =
= N=
DFS With DFS =
2NF NS N(N-l)D where
+ D FS .
It can be shown that the estimator £1 is unbiased for M. Expressions for SD( U1) and
Cov( UI,MF) were introduced in Equations (12) and (17) without the subscript F. The
other sampling moments are
-"
-"
Cov(Ul,DFS ) =
-
NF(NF-nF)­
nF(NF-l) UFDFS ,
SD(£1) = ([SD(£1F)]2
+ [SD(£1s)]2 + Var(.oFs)
1
- 2Cov(£1F,.oFS) - 2Cov(£1 s,.oFs )) 2
where
SD(£1h) = JVar((jh)
+ Var(.oh) + 2Cov(Uh, .oh)
Var(U ) = N2(Nh-nh) Uh(I-Uh)
h
-"
h
-"
N h -l
nh
2
N (Nh - n h ) -
-
Cov(Uh, Dh) = - 2 rt'h(Nh-2) U hDh
Cov(£1h,.oFs) = Nh( Nhn~nh)
(N~!:1 + ~~!:2)DFS
for h
= F, S,
and
These expressions are used in Equations (10) and (22) for the correlations P12, P13, and
P23,
and Equations (11) and (23) for the standardized lower and upper integration limits
a, b, and c. Derivations of the Cov(U 1,.oFs ), Var(.oFs) and Cov(£1 h,.oFs ), for h = F,S,
can be found in Section 3.8.2 in the Appendix.
Table 3.3 displays the probability of correct decision, PeD, for petition sizes
N
= NF + Ns = 110,000; 120,000; and 130,000; second submission sizes Ns = 1,000;
5,000; 10,000; and 20,000; number of distinct valid signatures M =
f R, where
53
Table 3.3 Probabilities of acceptance from the first sample (PI A)' acceptance from the
combined first and second samples (P2A ), and correct decision (PeD ) for two
submissions (h = F,S) with single duplication (D = F2)'
a. N =
jb
llO,oooa
P2A
P 1A
15
[j
Ns
15F
15s
15Fs
nl = 1,000
~= 5%
PeD
10%
5%
10%
0.99
0.99
0.99
0.99
1%
1
1
1
18.9% 1000
18.9
5000
18.9 10000
18.9 20000
0.982%
0.911
0.826
0.669
0.000%
0.002
0.008
0.033
0.018%
0.087
0.165
0.298
0.000
0.000
0.000
0.000
0.025
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.826
0.849
0.850
0.850
0.961
0.962
0.962
0.962
0.99
0.99
0.99
0.99
2
2
2
2
17.9
17.9
17.9
17.9
1000
5000
10000
20000
1.964
1. 823
1.653
1. 339
0.000
0.005
0.016
0.066
0.036
0.173
0.331
0.595
0.000
0.000
0.000
0.000
0.058
0.000
0.000
0.000
0.002
0.000
0.000
0.000
0.744
0.797
0.797
0.797
0.932
0.935
0.935
0.936
0.99
0.99
0.99
0.99
4
4
4
4
15.9
15.9
15.9
15.9
1000
5000
10000
20000
3.927
3.645
3.305
2.677
0.000
0.008
0.033
0.132
0.073
0.347
0.662
1.191
0.000
0.000
0.000
0.000
0.118
0.000
0.000
0.000
0.012
0.000
0.000
0.000
0.642
0.737
0.738
0.738
0.877
0.888
0.889
0.890
0.99
0.99
0.99
0.99
8
8
8
8
11.
11.
11.
11.
9
9
9
9
1000
5000
10000
20000
7.855
7.289
6.612
5.355
0.001
0.016
0.066
0.265
0.144
0.695
1.322
2.380
0.001
0.000
0.000
0.000
0.197
0.006
0.000
0.000
0.046
0.000
0.000
0.000
0.539
0.676
0.681
0.681
0.786
0.825
0.825
0.826
1. 00
1. 00
1. 00
1. 00
1
1
1
1
18.0
18.0
18.0
18.0
1000
5000
10000
20000
0.982
0.911
0.826
0.669
0.000
0.002
0.008
0.033
0.018
0.087
0.165
0.298
0.000
0.000
0.000
0.000
0.173
0.000
0.000
0.000
0.053
0.000
0.000
0.000
0.614
0.500
0.500
0.500
0.536
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
2
2
2
2
17.0
17 .0
17.0
17.0
1000
5000
10000
20000
1.964
1.823
1.653
1. 339
0.000
0.005
0.016
0.066
0.036
0.173
0.331
0.595
0.000
0.000
0.000
0.000
0.228
0.000
0.000
0.000
0.087
0.000
0.000
0.000
0.638
0.500
0.500
0.500
0.555
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
4
4
4
4
15.0
15.0
15.0
15.0
1000
5000
10000
20000
3.927
3.645
3.305
2.677
0.000
0.008
0.033
0.132
0.073
0.347
0.662
1.191
0.000
0.000
0.000
0.000
0.290
0.002
0.000
0.000
0.144
0.000
0.000
0.000
0.664
0.501
0.500
0.500
0.583
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
8
8
8
8
11. 0
11. 0
11.0
11.0
1000
5000
10000
20000
7.855
7.289
6.612
5.355
0.001
0.016
0.066
0.265
0.144
0.695
1.322
2.380
0.012
0.000
0.000
0.000
0.358
0.022
0.000
0.000
0.230
0.000
0.000
0.000
0.686
0.512
0.500
0.500
0.619
0.500
0.500
0.500
1. 01
1. 01
1. 01
1. 01
1
1
1
1
17.2
17 .2
17 .2
17.2
1000
5000
10000
20000
0.982
0.911
0.826
0.669
0.000
0.002
0.008
0.033
0.018
0.087
0.165
0.298
0.000
0.000
0.000
0.000
0.539
0.000
0.000
0.000
0.566
0.000
0.000
0.000
0.956
0.853
0.853
0.853
0.991
0.965
0.965
0.965
1. 01
1. 01
1. 01
1. 01
2
2
2
2
16.2
16.2
16.2
16.2
1000
5000
10000
20000
1.964
1.823
1.653
1. 339
0.000
0.005
0.016
0.066
0.036
0.173
0.331
0.595
0.000
0.000
0.000
0.000
0.535
0.001
0.000
0.000
0.563
0.000
0.000
0.000
0.928
0.799
0.799
0.799
0.981
0.937
0.938
0.938
1. 01
1. 01
1. 01
1. 01
4
4
4
4
14.2
14.2
14.2
14.2
1000
5000
10000
20000
3.927
3.645
3.305
2.677
0.000
0.008
0.033
0.132
0.073
0.347
0.662
1.191
0.000
0.000
0.000
0.000
0.532
0.012
0.000
0.000
0.562
0.000
0.000
0.000
0.895
0.743
0.738
0.739
0.961
0.890
0.891
0.891
1. 01
1. 01
1. 01
1. 01
8
8
8
8
10.2
10.2
10.2
10.2
1000
5000
10000
20000
7.855
7.289
6.612
5.355
0.001
0.016
0.066
0.265
0.144
0.695
1.322
2.380
0.071
0.000
0.000
0.000
0.569
0.060
0.000
0.000
0.598
0.001
0.000
0.000
0.862
0.704
0.681
0.681
0.931
0.826
0.826
0.827
aN
=
bM =
NF + Ns f R (R = 89,048) and correct decision are: reject for
f < 1 and accept for f
~ 1. 54
Table 3.3 (continued)
b. N = 120,oooa
jb
15
U
Ns
DF
155
PIA
DFs
n,
=
1,000
*
P2A
PeD
= 5%
10%
5%
10%
0.99
0.99
0.99
0.99
1%
1
1
1
25.5% 1000
25.5
5000
25.5 10000
25.5 20000
0.983%
0.918
0.840
0.694
0.000%
0.002
0.007
0.028
0.017%
0.080
0.153
0.278
0.000
0.000
0.000
0.000
0.042
0.000
0.000
0.000
0.002
0.000
0.000
0.000
0.788
0.828
0.828
0.828
0.942
0.945
0.945
0.945
0.99
0.99
0.99
0.99
2
2
2
2
24.5
24.5
24.5
24.5
1000
5000
10000
20000
1.967
1.837
1.681
1. 389
0.000
0.003
0.014
0.056
0.033
0.160
0.305
0.555
0.000
0.000
0.000
0.000
0.081
0.000
0.000
0.000
0.006
0.000
0.000
0.000
0.709
0.780
0.780
0.781
0.911
0.917
0.917
0.918
0.99
0.99
0.99
0.99
4
4
4
4
22.5
22.5
22.5
22.5
1000
5000
10000
20000
3.933
3.673
3.361
2.778
0.000
0.007
0.028
0.111
0.067
0.320
0.612
1.112
0.000
0.000
0.000
0.000
0.142
0.001
0.000
0.000
0.021
0.000
0.000
0.000
0.612
0.724
0.726
0.726
0.852
0.872
0.872
0.873
0.99
0.99
0.99
0.99
8
8
8
8
18.5
18.5
18.5
18.5
1000
5000
10000
20000
7.867
7.348
6.722
5.556
0.001
0.014
0.056
0.223
0.132
0.638
1.222
2.222
0.004
0.000
0.000
0.000
0.221
0.015
0.000
0.000
0.066
0.000
0.000
0.000
0.516
0.661
0.673
0.673
0.758
0.811
0.812
0.813
1. 00
1. 00
1. 00
1. 00
1
1
1
1
24.8
24.8
24.8
24.8
1000
5000
10000
20000
0.983
0.918
0.840
0.694
0.000
0.002
0.007
0.028
0.017
0.080
0.153
0.278
0.000
0.000
0.000
0.000
0.216
0.000
0.000
0.000
0.093
0.000
0.000
0.000
0.640
0.500
0.500
0.500
0.561
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
2
2
2
2
23.8
23.8
23.8
23.8
1000
5000
10000
20000
1.967
1. 837
1. 681
1. 389
0.000
0.003
0.014
0.056
0.033
0.160
0.305
0.555
0.000
0.000
0.000
0.000
0.264
0.001
0.000
0.000
0.129
0.000
0.000
0.000
0.660
0.500
0.500
0.500
0.579
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
4
4
4
4
21.8
21.8
21.8
21.8
1000
5000
10000
20000
3.933
3.673
3.361
2.778
0.000
0.007
0.028
0.111
0.067
0.320
0.612
1.112
0.000
0.000
0.000
0.000
0.317
0.007
0.000
0.000
0.184
0.000
0.000
0.000
0.680
0.505
0.500
0.500
0.606
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
8
8
8
8
17 .8
17.8
17.8
17.8
1000
5000
10000
20000
7.867
7.348
6.722
5.556
0.001
0.014
0.056
0.223
0.132
0.638
1.222
2.222
0.019
0.000
0.000
0.000
0.383
0.041
0.000
0.000
0.270
0.000
0.000
0.000
0.698
0.524
0.500
0.500
0.638
0.500
0.500
0.500
1.01
1. 01
1. 01
1. 01
1
1
1
1
24.1
24.1
24.1
24.1
1000
5000
10000
20000
0.983
0.918
0.840
0.694
0.000
0.002
0.007
0.028
0.017
0.080
0.153
0.278
0.000
0.000
0.000
0.000
0.565
0.001
0.000
0.000
0.608
0.000
0.000
0.000
0.952
0.830
0.830
0.831
0.987
0.947
0.947
0.947
1. 01
1. 01
1. 01
1. 01
2
2
2
2
23.1
23.1
23.1
23.1
1000
5000
10000
20000
1.967
1.837
1. 681
1. 389
0.000
0.003
0.014
0.056
0.033
0.160
0.305
0.555
0.000
0.000
0.000
0.000
0.556
0.006
0.000
0.000
0.600
0.000
0.000
0.000
0.927
0.784
0.782
0.782
0.977
0.919
0.919
0.920
1. 01
1. 01
1. 01
1. 01
4
4
4
4
21.1
21.1
21. 1
21.1
1000
5000
10000
20000
3.933
3.673
3.361
2.778
0.000
0.007
0.028
0.111
0.067
0.320
0.612
1.112
0.000
0.000
0.000
0.000
0.549
0.031
0.000
0.000
0.592
0.000
0.000
0.000
0.896
0.738
0.726
0.727
0.959
0.873
0.874
0.874
1. 01
1. 01
1. 01
1. 01
8
8
8
8
17 .1
17 .1
17 .1
17.1
1000
5000
10000
20000
7.867
7.348
6.722
5.556
0.001
0.014
0.056
0.223
0.132
0.638
1.222
2.222
0.070
0.000
0.000
0.000
0.581
0.098
0.001
0.000
0.621
0.006
0.000
0.000
0.864
0.711
0.673
0.673
0.931
0.813
0.812
0.813
aN = NF+Ns
= f R (R = 89,048) and correct decision are: reject for f
bM
< 1 and accept for f ;::: 1. 55
Table 3.3 (continued)
c. N = 130,oooa
P2A
PiA
fb
15
fJ
Ns
15F
15s
15Fs
n, =
1,000
If; =
5%
PeD
10%
5%
10%
0.99
0.99
0.99
0.99
1%
1
1
1
31.2% 1000
31.2
5000
31.2 10000
31.2 20000
0.985%
0.925
0.852
0.716
0.000%
0.002
0.006
0.024
0.015%
0.074
0.142
0.260
0.000
0.000
0.000
0.000
0.060
0.000
0.000
0.000
0.005
0.000
0.000
0.000
0.755
0.812
0.812
0.812
0.925
0.930
0.930
0.931
0.99
0.99
0.99
0.99
2
2
2
2
30.2
30.2
30.2
30.2
1000
5000
10000
20000
1.969
1.849
1.704
1. 432
0.000
0.003
0.012
0.048
0.031
0.148
0.285
0.520
0.000
0.000
0.000
0.000
0.101
0.000
0.000
0.000
0.012
0.000
0.000
0.000
0.679
0.767
0.767
0.768
0.891
0.902
0.903
0.904
0.99
0.99
0.99
0.99
4
4
4
4
28.2
28.2
28.2
28.2
1000
5000
10000
20000
3.938
3.698
3.408
2.864
0.000
0.006
0.024
0.095
0.062
0.295
0.568
1.042
0.000
0.000
0.000
0.000
0.163
0.003
0.000
0.000
0.032
0.000
0.000
0.000
0.587
0.713
0.716
0.716
0.828
0.858
0.859
0.860
0.99
0.99
0.99
0.99
8
8
8
8
24.2
24.2
24.2
24.2
1000
5000
10000
20000
7.878
7.396
6.817
5.728
0.001
0.012
0.048
0.189
0.122
0.592
1.135
2.083
0.007
0.000
0.000
0.000
0.242
0.026
0.000
0.000
0.085
0.000
0.000
0.000
0.497
0.646
0.666
0.666
0.733
0.800
0.801
0.802
1. 00
1. 00
1. 00
1. 00
1
1
1
1
30.5
30.5
30.5
30.5
1000
5000
10000
20000
0.985
0.925
0.852
0.716
0.000
0.002
0.006
0.024
0.015
0.074
0.142
0.260
0.000
0.000
0.000
0.000
0.250
0.000
0.000
0.000
0.130
0.000
0.000
0.000
0.659
0.500
0.500
0.500
0.583
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
2
2
2
2
29.5
29.5
29.5
29.5
1000
5000
10000
20000
1.969
1.849
1.704
1.432
0.000
0.003
0.012
0.048
0.031
0.148
0.285
0.520
0.000
0.000
0.000
0.000
0.292
0.002
0.000
0.000
0.165
0.000
0.000
0.000
0.675
0.502
0.500
0.500
0.600
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
4
4
4
4
27.5
27.5
27.5
27.5
1000
5000
10000
20000
3.938
3.698
3.408
2.864
0.000
0.006
0.024
0.095
0.062
0.295
0.568
1.042
0.000
0.000
0.000
0.000
0.339
0.016
0.000
0.000
0.217
0.000
0.000
0.000
0.692
0.510
0.500
0.500
0.625
0.500
0.500
0.500
1. 00
1. 00
1. 00
1. 00
8
8
8
8
23.5
23.5
23.5
23.5
1000
5000
10000
20000
7.878
7.396
6.817
5.728
0.001
0.012
0.048
0.189
0.122
0.592
1.135
2.083
0.024
0.000
0.000
0.000
0.402
0.065
0.001
0.000
0.301
0.002
0.000
0.000
0.707
0.537
0.500
0.500
0.653
0.501
0.500
0.500
1. 01
1. 01
1. 01
1. 01
1
1
1
1
29.8
29.8
29.8
29.8
1000
5000
10000
20000
0.985
0.925
0.852
0.716
0.000
0.002
0.006
0.024
0.015
0.074
0.142
0.260
0.000
0.000
0.000
0.000
0.583
0.005
0.000
0.000
0.636
0.000
0.000
0.000
0.949
0.815
0.813
0.814
0.985
0.932
0.932
0.932
1. 01
1. 01
1. 01
1. 01
2
2
2
2
28.8
28.8
28.8
28.8
1000
5000
10000
20000
1.969
1.849
1.704
1. 432
0.000
0.003
0.012
0.048
0.031
0.148
0.285
0.520
0.000
0.000
0.000
0.000
0.571
0.018
0.000
0.000
0.626
0.000
0.000
0.000
0.926
0.775
0.768
0.769
0.975
0.904
0.904
0.905
1. 01
1. 01
1. 01
1. 01
4
4
4
4
26.8
26.8
26.8
26.8
1000
5000
10000
20000
3.938
3.698
3.408
2.864
0.000
0.006
0.024
0.095
0.062
0.295
0.568
1.042
0.000
0.000
0.000
0.000
0.561
0.057
0.000
0.000
0.614
0.002
0.000
0.000
0.895
0.739
0.716
0.717
0.957
0.860
0.860
0.861
1. 01
1. 01
1. 01
1. 01
8
8
8
8
22.8
22.8
22.8
22.8
1000
5000
10000
20000
7.878
7.396
6.817
5.728
0.001
0.012
0.048
0.189
0.122
0.592
1.135
2.083
0.070
0.001
0.000
0.000
0.590
0.139
0.003
0.000
0.636
0.017
0.000
0.000
0.864
0.720
0.667
0.666
0.930
0.805
0.801
0.802
aN = NF
bM
+ Ns
= f R (R = 89,048) and correct decision are: reject for f < 1 and accept for f :::: 1. 56
J=
0.99, 1.00, and 1.01; duplication rates D = 0.01,0.02,0.04, and 0.08; and sampling
fractions
M.
= 0.05, and 0.10. Also included are the values ofU, D F , Ds, and D FS
corresponding to fixed values for N,
In Table 3.3, for fixed
f
and D, the probability of correct decision (PeD) is
approximately constant for Ns
Ns
=
J, D, and N s.
~
5,000 and is more favorable to the petitioner for
1,000. For the cases with J = 1 (M
except for the cases where Ns
=
1,000; D
= R), the PeD is approximately equal to 0.50,
= 8%, Ns = 5,000 and M. = 5%. In these
cases the decision rule is biased in favor of the petitioner, PeD
> 0.50, and tends to
increase in D. This increase appears to be due to increasing P2A. That is, the probability
of accepting the petition from the combined first and second samples, P2A , tend to
increases with D.
3.6 Summary and Conclusions
We have investigated the probability of a correct decision for the Oregon decision
rule in a variety of situations. The probability of a correct decision was approximated by a
multivariate normal distribution for the joint sampling distribution of estimators from the
multiple stages. Cases for both single and multiple duplication of valid signatures were
considered for single submissions. Only single duplication of valid signatures was
considered for two submissions.
For a single submission with single duplication of valid signatures, two stages of
sampling, with the possibility of acceptance at each stage, will tend to bias the decision
rule in favor of the petitioner. This effect is very small except when the petition
duplication rate (D) is equal to or larger than the assumed upper bound for the
duplication rate (DiU) for the first sampling stage. Conversely, the negative bias of the
57
estimator M under multiple duplication will tend to bias the decision rule against the
petitioner.
For two submissions with a fixed total number of single duplicates of valid signatures
in a fixed sized petition (NF + Ns
petitioner (smaller PeD for M
= N), smaller second submission size favors the
< R and larger PeD for M
~
R). This effect was small for
second submission sizes of 5,000 or larger since PeD was approximately constant.
3.7 References
GAUSS (1992), The Gauss Version 3.0, Aptech Systems, Inc. Maple Valley, Washington
Houser, J. (1985), Validating Initiative and Referendum Petition Signatures, Research
Monograph, Legislative Research, S420 State Capitol, Salem, Oregon.
Oregon Secretary of State, Elections Division: 1999 Oregon Administrative Rules,
Chapter 165.
Smith-Cayama, R. A. and Thomas, D. R. (1999), Estimating the Number of Distinct
Valid Signatures in Initiative Petitions, Unpublished manuscript. Department of
Statistics, Oregon State University, Corvallis, Oregon.
58
3.8 Appendix
59
3.8.1 Calculation of Cov( iT \,M)
Let Djo: denote the number of valid signatures in the combined first and second
n
samples,
for
0;
=
= n!
+ n2,
from the O;th elector who signed j valid signatures in the petition,
1,2, ... , F j , and j
=
1, ... , N. Note that Djo: has the hypergeometric (N, n, j, i)
with Pij = P( Djo: = i) given by
for i = 0, 1, ... , j.
Define for 8 = 1, ... ,U
as =
{
I
2
°
ifthe 8th invalid signature is selected in the first sample of size n!
if the 8th invalid signature is selected in the second sample of size n2
otherwise
Note that P( as
=
1)
=
~ and P( as
= 2) =
~. The conditional distribution of DjO:' given
as = k (fork = 1,2), ishypergeometric (N - l,n -1,j,i) with
Pij .!! = P(Djo: = ila s = k) given by
(j) (N-l-j)
=
(N-l)
~
I1,j.ll
n-l-~
for i = 0, 1, ... , j.
n-l
For the number of electors in the sample with i valid signatures, Ii, write
Fj
n
Ii
= LIij
j=i
with
Iij
=
LI(DjO:
=
i),
0:=1
where I (.) is the indicator function and Iij denotes the number of electors with i
signatures in the combined sample and j signatures in the petition (i ::; j). Note that Iij is
not observable, but Ii is. For the proportions of invalid signatures in the first and the
second samples, Ul and U2, write
60
Then,
n
E(fi) = ~PijFj
j=i
and
E(1h) = U
_)
Var (Uk = and
nk) U (1 - U)
nk N-l
(N -
for k = 1,2.
The next results are needed to calculate the Cov( f) J ,!Vi).
U(l-U)
N-l
U(J-U)
N-J
.
n
Lemma2 COV(Ul,fi) = U2:(~j.lJ - Pij)Fj
j=i
_
Proof· COV(Ul,fi) =
J U n Fj ._n
n 2:2:2:
P(a s = 1)P(6ja = Lias = 1) - U2: Pij Fj
1 s=lj=i a=l j=i
n
n
J=l
J=t
-- U"P'IIY
~. tJ.
J - U"pp·
~. tJ J
n
= U2:(Pij .11
-
~j)Fj. J=l
:1
From the form of f) = N ( UI
-"
D=
N(N -I) n
n(n-I)
calculated as
2:(i -
l=2
+ ~ U2)
and the general form of
-"
-"
-"-"
l)fi, the covariance between U I = NUl and M = N - U - Dis
61
Cov(U1,M) = -
~2 (n1var(ul) + n 2Cov(UI,U2))
_ N2(N-n) U(I-U) _ N2(N-I)
n
N-l
n(n-l)
-
N:/:~;)~(i
U~(i - 1)~(R'
~
7=2
~.
J=7
7J.11
- I)COV(Ul,fi)
_ R)Y.
7J
J
3.8.2 Calculation of Cov(U .,D FS ), Var(D Fs ), and Cov(M h,D FS ) for h = F,S
Consider two independent random samples of nF and ns signatures drawn without
replacement from the first and second submissions of sizes NF and N s, respectively.
Further, "first sample" of size nl from the first submission is a random sub-sample from
the combined sample of size nF = nl
+ n2. Only single duplication of valid signatures is
considered. Define
if the kth invalid signature is selected in the
first sample of size nl
for k = 1,2, ... ,UF
otherwise
1
bz =
if the both signatures from the lth elector
who signed in each submission are selected
in the samples of size nF and ns
for l = 1,2, ... ,D FS
o
otherwise
if the pth invalid signature is selected in
the sample(s) from the hth submission
otherwise
for p
= 1,2, '" ,Uh, h = F,S
62
and let ghq denote the number of signatures in the sample taken in the hth submission
from the qth elector with two signatures in the hth submission, for q = 1,2, ... ,Dh, and
h
= F,S.
Note that,
P(ak = 1) = ;~
fork = 1,2, ... ,UF
P(b 1 = 1) = ;:~s
for! = 1,2, ... ,DFS
(nF- i )ns
P(bl=ll ak=I)= (NF-i )Ns
forl=I,2, ... ,D Fs andk=I,2, ""UF
(nF-l) (ns-l)
P(bl = Ilbm = 1) = (NF-l)(Ns-l)
for m, l = 1,2, ... , D FS with m
-I- l
and for h = F, S
P(Chp = 1) =
r;;"
for p = 1,2, ... , Uh
for q = 1,2, ... , Dh
for m = 1,2, ... , D FS , P = 1,2, ... , Uh
for m = 1,2, ... , D FS , q = 1,2, ... , Dh.
For the proportion of invalid signatures in the first sample, 'ill, and for the number of
duplicates of valid signatures between the samples from the first and second submissions,
D FS
and
Then,
dFS =
2::)1.
1=1
63
Thus, the covariance between f) I and
Dps is
Note
Var(dps )
DFS
DFs DFS
1=1
[=1 m=l
= LP(bl = 1) + L L P(b1 = 1)P(bm =
_
-
nFnS
NFNS
11 bl
= 1) - (;:~s Dps)
mopl
D
PS
+
nFnS (nF-1)(ns-1)
NFNS (NF-1)(Ns-1)
D
PS
(D
PS
_ 1) _
(nFn s
NFNS
D
PS
)
2
Since variance of Dps is given by
Var( Dps ) =
(NFNS) 2 Var(dps )
nFnS
For the proportion of invalid signatures in the sample from the hth submission, fh,
and the number of duplicates of signatures in the sample from the hth submission,
h = F,S, write
and
2
64
Dh
dh
=
LI(ghq
= 2)
q=l
where I (.) is the indicator function. The covariance between
Mhand DFS is calculated
as
The covariance between 7fh and dFS is given by
and the covariance between d h and dFS ,
Dh D FS
COV(dh,d FS )
= J;lflP(9h q =
2)P(b m
=
llghq
nFnsnh(nh- 1) D
D
NFNsNh(Nh-l) FS h
-
(nh- 1) (nh-2) nFnS D D
_
(Nh-l) (Nh-2) NFNS h FS
=
_ 2 (nh-1)(Nh-nh) nFnS D D
N h(Nh-I)(Nh-2) NFNs h FS· Therefore, for h = F,S
= 2) - (~f~=~~Dh) (;:~sDFS)
65
Chapter 4
Summary
Initiative petitions that are filed with Election Offices to propose additions or
modifications to statutes or the state constitution can contain duplicates of valid
signatures. Statistical sampling is used by several states either to reduce verification costs
or by necessity because of the constraint on the time permitted for verification.
It is then of interest to estimate the number of distinct valid signatures in an initiative
petition.
In Chapter 2, we addressed the problem of estimating the number of distinct valid
signatures based on the verification of a random sample of signatures. Several linear
estimators and a non-linear estimator for the number of distinct valid signatures were
compared with regard to their bias and RMSE for four completely verified Washington
State petitions. Comparison results showed linear estimators based on the expansion
factor for single duplication performed much better than the unbiased estimator and the
non-linear estimator based on the jackknife technique for all situations considered. In
sampling fractions of 10% or less, the probability of observing multiple duplicates of
valid signatures is very small. As a result only a small improvement in RMSE was
observed over the Goodman-type estimator M2 among all the other estimators
considered. To reduce the bias ofthe linear estimators for multiple duplication, we
considered a bias adjustment factor that requires prior information about duplicate
replication such as data from similar completely verified petitions. The improvement in
RMSE ofthe bias-adjusted estimators for the four Washington state petitions was
negligible when compared to their non-adjusted counterparts.
66
In Chapter 3, we addressed the problem of evaluating the decision rule adopted by
Oregon for their state initiative and referendum petitions by determining an
approximation for the probability of correct decision (PeD ). The estimator M used in the
Oregon decision rule is the estimator denoted as
adopted because the duplication component,
Md in Chapter 2. This estimator was
fJ, simply depends on the number of
duplicates of valid signatures in the sample (d), and performed similarly to the
Goodman-type estimator M2. For single submission of signatures we calculated the
probability of correct decision for both single and multiple duplication of valid signatures
for several sampling fractions and three petition sizes. The PeD was approximated by a
multivariate normal distribution model for the joint sampling distribution ofthe
estimators involved in the multiple stages. For two submissions only single duplication of
valid signatures was evaluated for two submissions.
In the case of a single submission with single duplication of valid signatures, the use
of two sampling stages in the decision rule only favors the petitioner when the true
duplication rate achieves or exceeds the assumed upper limit for duplication rate used in
the first sampling stage. For multiple duplication the decision rule becomes unfavorable
to the petitioner. In the case of two submissions with single submission, small second
submission size tends to favor the petitioner and for large sizes (Ns 2': 5,000) the PeD is
approximately constant.
The results of this dissertation were helpful to the Oregon Elections Division in their
decision to change the certification rule to include the estimation of the number of
duplicates of valid signatures as described in Sections 3.4.1 and 3.5.1. Previously, a
duplication rate of 2% was assumed for all petitions.
67
Bibliography
Bunge, J. and Fitzpatrick, M. (1993), Estimating the Number of Species: A Review,
Journal ofthe American Statistical Association, 88, 364-373.
Haas, P. J. and Stokes, L. (1998), Estimating the Number of Classes in a Finite
Population, Journal ofthe American Statistical Association, 93,1475-1487.
Houser, J. (1985), Validating Initiative and Referendum Petition Signatures, Research
Monograph, Legislative Research, S420 State Capitol, Salem, Oregon.
GAUSS (1992), The Gauss Version 3.0, Aptech Systems, Inc. Maple Valley, Washington
Goodman, L. A. (1949), On the Estimation of the Number of Classes in a Population,
Annals ofMathematical Statistics, 20, 572-579.
Oregon Secretary of State, Elections Division: 1999 Oregon Administrative Rules,
Chapter 165.
Download