Performance Optimization of a Split-Value Voting System Charles Z. Liu

Performance Optimization of a Split-Value Voting System
by
Charles Z. Liu
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
c Massachusetts Institute of Technology 2015. All rights reserved.
○
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 20, 2015
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Ronald L. Rivest
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Albert R. Meyer
Chairman, Masters of Engineering Thesis Committee
2
Performance Optimization of a Split-Value Voting System
by
Charles Z. Liu
Submitted to the Department of Electrical Engineering and Computer Science
on May 20, 2015, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
As digital technology becomes more powerful and commonplace, the benefits of using computers to conduct elections become more apparent. In today’s elections dominated by paper
ballots, we cannot be certain that the election result is correct. Many parts of the world are
plagued by corrupt election officials and rampant bribery.
In this thesis, we review the cryptographic techniques available for designing a secure
election system and introduce a system designed around a verifiable mixnet using split-value
commitments. The main work done in this thesis is a series of performance optimizations to
the existing prototype, greatly improving the real-world viability of the system. Finally, we
suggest further work that can be done to improve performance, fault tolerance, and security.
The code that accompanies this thesis may be found at https://github.com/ron-rivest/
split-value-voting.
Thesis Supervisor: Prof. Ronald L. Rivest
3
4
Acknowledgments
Professor Ron Rivest introduced me to the fields of computer security and cryptography
and sparked my interest when I took 6.857 Computer and Network Security in the spring of
2013. His knowledge, encouragement, and patience helped guide and encourage me through
every step of this thesis. I’m also extremely grateful that, through working with Professor
Rivest, I’ve been able to delve into the field of electronic voting, a field that I barely knew
anything about a year ago.
Professor Rivest and Professor Michael Rabin designed the split-value voting system
that this thesis is based on. Without the numerous meetings that we had where both
of them spent time meticulously and patiently explaining every detail of the system, this
thesis would not have been possible. Marco Pedroso completed an implementation of the
distributed version of this system in the fall of 2014, and this thesis builds upon his work.
In my time at MIT, I have been fortunate to have taken some great classes taught by
some great professors who have inspired me to pursue a better understanding of various
aspects of computer science. I’ve also had the honor to teach and TA incredible students,
who made me learn things in ways I never would have otherwise.
To my friends, at MIT, in Boston, and from back home, thanks for being with me every
step of the way and pushing me to achieve things I didn’t know were possible. Years later,
having forgotten all of the psets and readings, I’ll still remember all the experiences and
memories.
And last, but certainly not least, to my family, for giving me all the resources that I’ve
ever needed to get the education that I have today. Whether it’s the math drills fifteen
years ago or last week’s call to make sure I wasn’t staying up too late writing this thesis
(oops...), I wouldn’t be at MIT without you all. Thanks for your infinite support, wisdom,
advice, and encouragement.
5
6
Contents
1 Introduction
13
1.1
A Brief History of Voting Systems . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2
The Rise of Electronic Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3
Desirable Properties in Electronic Voting Systems . . . . . . . . . . . . . . . . 16
1.4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Electronic Voting Systems and Technologies
2.1
2.2
2.3
2.4
19
Design of End-to-End Voting Systems . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1
A Secure Bulletin Board . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2
Cast-as-Intended Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3
Recorded-as-Cast Behavior . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.4
Tallied-as-Recorded Behavior . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.5
Threshold Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Homomorphic Encryption Schemes . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1
Exponential El Gamal . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2
Existing Systems Using Homomorphic Encryption . . . . . . . . . . . 24
Mixnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1
Decryption and Reencryption Mixnets . . . . . . . . . . . . . . . . . . 25
2.3.2
Verifiable Mixnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3
Early Mixnets: Individual Verifiability . . . . . . . . . . . . . . . . . . 26
2.3.4
Sako and Kilian: Universal Verifiability . . . . . . . . . . . . . . . . . 27
2.3.5
Jakobsson, Juels, and Rivest: Randomized Partial Checking . . . . . . 28
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7
3 End-to-End Verifiable Voting Using Split-Value Representations
3.1
3.2
3.3
The Split-Value Voting System Design . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1
Definitions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.2
The Verifiable Mixnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Implementation Details
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1
Server Processes and Communication . . . . . . . . . . . . . . . . . . . 34
3.2.2
The Implementation of Randomness . . . . . . . . . . . . . . . . . . . 35
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Performance Optimization of the Split-Value System
4.1
31
37
Performance of Cryptographic Primitives . . . . . . . . . . . . . . . . . . . . . 38
4.1.1
Hashing and Encryption Functions . . . . . . . . . . . . . . . . . . . . 38
4.1.2
Pseudorandom Number Generators . . . . . . . . . . . . . . . . . . . . 39
4.2
Performance of JSON Serializers . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3
Optimization of the Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1
The Python Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.2
Message Passing Over Sockets . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.3
Byte/Integer Utility Functions . . . . . . . . . . . . . . . . . . . . . . 43
4.3.4
Introducing Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.5
JSON Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.6
Hashing the Secure Bulletin Board . . . . . . . . . . . . . . . . . . . . 46
4.4
Performance Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5
Comparison with Other Systems . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Future Work
51
5.1
Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2
Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3
Secure Channels and Key Exchange . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Conclusion
55
8
List of Figures
3-1 The 3 × 3 grid of mixservers and the communication among them . . . . . . . 33
4-1 CProfile output for a single call to a mixserver handler . . . . . . . . . . . . . 42
9
10
List of Tables
4.1
Performance measurements of cryptographic primitives . . . . . . . . . . . . . 38
4.2
Performance measurements of pseudorandom number generators
4.3
Performance measurements of JSON serializers . . . . . . . . . . . . . . . . . 40
4.4
Election runtime at each stage of optimization . . . . . . . . . . . . . . . . . . 41
4.5
Election runtime for varying numbers of voters . . . . . . . . . . . . . . . . . 47
4.6
Election runtime for varying numbers of repetitions . . . . . . . . . . . . . . . 48
11
. . . . . . . 39
12
Chapter 1
Introduction
As a cornerstone of democratic governments, elections are the subject of intense scrutiny.
Voters expect that the systems and procedures put in place are easy to follow, efficient, and
inexpensive. However, history tells us that in practice, this is not always be the case.
The 2000 Presidential Election in the United States is perhaps the most well-known
failure of a voting system in the current generation. The series of recounts in Florida delayed
the nationwide election result by several weeks and opened cases in the Florida Supreme
Court and the United States Supreme Court. Even in the aftermath of the election, studies
were inconclusive as to whether county-specific recounts or a statewide recount would have
swung the results in Al Gore’s favor. The problem was exacerbated by the media’s premature
declaration of the election outcome, George W. Bush’s narrow margin of victory (around 500
votes), and controversy surrounding the “butterfly ballots” that were used in Palm Beach
for the first time [40].
Studies conducted after the 2000 Election in Florida reveal many issues that appear
to be quite subtle in the larger picture of a Presidential election: the public’s familiarity
with a new ballot layout and instructions, the timing of media releases and exit polls, and
even the degree to which paper ballots are cleanly marked or punctured by the voter. In
the aftermath of the chaos, Congress passed the Help America Vote Act, which helped
states upgrade their election infrastructure to prevent similar issues four years later. Many
states turned to electronic voting systems. Unfortunately, these systems have been criticized
for their issues surrounding accessibility, relability, and security [36]. Furthermore, many
of these proprietary systems have reached or are approaching the ends of their lifetimes,
13
prompting another wave of research and development on secure, verifiable, efficient, and
reliable electronic voting systems.
Elections are not only thwarted by the failures of the mechanisms used in them. Reports
of manipulated or lost ballots, coerced voters, and corrupt election officials are no surprise
in certain parts of the world even today. In decades past, voters in Italian elections often
had the option of selecting ranked preferences among a long list of candidates. The clever
“Italian attack” [24] reportedly used by the Mafia involved coercing a voter to vote for
the coercer’s first choice candidate, followed by a sequence of low-probability candidates in
some predetermined order. With a long list of candidates, the chance that another ballot
contains the exact same sequence in the same slots is very low; thus, if the votes are publicly
broadcast following the election, the sequence becomes a signature for the coerced voter that
the coercer can check, even if all other identifiable information is stripped.
Clearly, there is much more to be done in improving election systems. In this chapter,
we briefly survey the history of voting systems, including electronic voting systems, and
conclude with a discussion of desirable properties in electronic voting systems.
1.1
A Brief History of Voting Systems
The first recorded elections occurred in the sixth century BC in ancient Greece, where adult
males were charged with the responsibility to vote at meetings of the assembly. In the
beginning, voting was typically done by raising hands in public spaces. This developed into
the use of colored stones, where voters would select one of two colored stones and place it
into a central jar. The first “secret ballots” were used in ancient Greece and Rome, where
records exist of contraptions that allowed a voter to hide the stone as it was deposited into
the jar. By the end of the nineteenth century, most Western countries, including the United
States, Australia, and France, demanded the use of a secret ballot at elections [39], and
voter privacy was a non-negotiable property of official elections.
As populations grew and election officials looked for ways to make elections more efficient, mechanized voting machines were developed. Lever machines were introduced in
New York during the 1892 Election. Punch card systems were introduced in 1965 and were
also widely used until they drew sharp criticism during the 2000 Election in Florida. Lever
machines, punch cards, and other mechanical and electrical systems such as optical scan14
ners provide different tradeoffs in cost, convenience, efficiency, usability, accessibility, and
security. However, in the past decades, growth in both the power and popularity of digital
technology has enabled a shift towards electronic voting systems.
1.2
The Rise of Electronic Voting
In the aftermath of the 2000 Election, the Help America Vote Act instituted a series of
requirements on the states which aimed to replace punch card and lever machines, establish
standards on voter privacy and accessibility for voters with disabilities, and improve the usability of voting systems. The number of direct-recording electronic (DRE) voting machines
grew in response. DREs have a number of advantages over their mechanical counterparts.
∙ They enable rapid, accurate tallying of votes.
∙ They may include numerous accessibility functions, notably those for the visually
impaired.
∙ They may include many languages.
∙ They may catch errors such as undervoting and overvoting before the vote is cast,
preventing invalid ballots from being discarded.
However, DREs are often complex systems containing proprietary hardware and software,
making it difficult to verify their correctness and security properties; indeed, studies [36, 11,
17] have shown the existence of numerous security vulnerabilities in many commercially
available electronic voting systems. Because of the lack of a paper trail, any errors in the
system are unrecoverable.
The voter-verified paper audit trail (VVPAT) system was proposed by Mercuri [23] in
1992 as a solution to the lack of verifiability in DREs. In the simplest VVPAT system, a DRE
machine is augmented with a printer. In addition to submitting the vote electronically, the
machine prints out a human-readable copy of the voter’s selections. The voter then confirms
that the paper record is indeed correct before depositing it into a ballot box. In case of a
disputed election, recount, or malfunctioning machines, the paper records may be consulted.
VVPAT systems may be extended so that the voters may keep a printed receipt; however,
care must be taken to ensure that the printed receipt does not reveal the voter’s vote to
anyone else.
15
Today, VVPAT systems are the most common type of electronic voting system; 27 states
in the United States require a paper audit trail by law and an additional 18 states use them
in their elections.
1.3
Desirable Properties in Electronic Voting Systems
The two main propeties desired in an election are privacy and verifiability.
Privacy concerns the ability of an adversary to gain information on a voter’s choices in
an election. A lack of privacy results in the possibility of coercion or bribery as seen in the
earlier example of the “Italian attack”. Note that the adversary is not limited to an outside
third party; privacy is broken if a corrupt election official may associate any vote with the
corresponding voter. As voting systems have become more complex and electronic voting
systems have become more popular, the notion of privacy has evolved.
The simplest form of privacy is ballot privacy, which stipulates that the contents of the
ballot must remain secret. With typical paper ballots such as those used today in the United
States, ballot privacy guarantees that there can be no association made between a voter and
her ballot, thereby providing privacy to the voter. However, with electronic voting systems,
a stronger definition of privacy is desired. Receipt freeness or incoercibility, first introduced
by Benaloh and Tuinstra [5], concerns the inability of the voter to later prove how she voted.
Equivalently, it is impossible to effectively coerce a voter by forcing her to reveal her vote
following the election. The concept of receipt freeness grew in importance as electronic
voting systems often leave a receipt that the voter may bring out of the polling site.
In countries such as Switzerland and Estonia, Internet voting has become commonplace
[37]. In Estonia, for example, voters may scan their government issued ID cards to authenticate themselves on a voting website. Many more states and countries have vote-by-mail,
where ballots may be mailed ahead of the election date. A stronger notion of coercion resistance stemming from the work of Juels, Catalano, and Jakobsson [20] identified the need to
prevent adversaries from jeopardizing voter privacy in these cases, for example, by stealing
or forcing a voter to reveal her authentication credentials.
Verifiability concerns whether individual voters or the public as a whole may verify
that the election outcome faithfully represents voters’ selections. In particular, individual
verifiability refers to the ability of a voter to verify that her vote was correctly recorded in
16
the election. Universal verifiability refers to the ability of any member of the public to verify
that the declared election outcome matches the set of votes that were cast.
Although today’s paper ballots provide good privacy, they do so at immense cost to verifiability. Despite thorough documentation of best practices concerning how ballots should
be handled, who should touch them, how they should be sealed, etc. [26], it remains practically impossible for a voter to know for certain that her ballot was not lost in transit.
Since ballots are not released to the public, voters must trust that the outcome declared by
election officials represents the accurate tallying of all ballots. As seen in the 2000 Presidential Election, good verifiability is not a strength in the systems used in the majority of
the United States.
For electronic voting systems, however, we may achieve verifiability while preserving
privacy by using cryptography where appropriate. For example, a vote may be encrypted
before being cast; the encrypted votes may then be added homomorphically to yield an
encryption of the final election outcome, enabling the election officials to compute the result
of an election without revealing any individual votes. Threshold encryption may strengthen
this. For example, by requiring that 3 out of 5 election officials be present to obtain the
decryption key, we reduce the chance of a malicious election official revealing individual
votes.
By introducing a secure, publicly viewable bulletin board, we may achieve verifiability
at every stage from when the vote is cast until the election result is revealed. Such a system
is end-to-end verifiable.
Definition 1.3.1. An end-to-end verifiable voting system is one that satisfies the following
properties.
∙ Cast-as-intended. It may be verified that the encryption or other representation of the
vote is indeed a representation of the proper vote. Methods to ensure cast-as-intended
behavior include Chaum’s work [8] or the challenge/response method proposed by Neff
[25] and Benaloh [4].
∙ Recorded-as-cast. It may be verified by the voter that her vote is properly taken into
account. For example, voters may be given identifiers, and a public bulletin board
may contain a listing of voter identifiers and corresponding encrypted votes. We must
ensure that this step does not reveal the plaintext vote to an adversary.
17
∙ Tallied-as-recorded. It may be verified by anyone that the final election result is correct
given the individual votes. For example, a list of encrypted votes could be homomorphically added, and then the election officials could provide a zero-knowledge proof
that the election result corresponds to a decryption of the tallied votes without revealing the decryption key.
1.4
Conclusion
In this chapter, we presented an overview of voting systems: their history, implementations,
and downfalls. We discussed the various notions of privacy and verifiability and introduced
the concept of an end-to-end verifiable voting system. In the remainder of this thesis, we
survey the relevant technologies used in end-to-end verifiable voting systems in Chapter 2,
introduce the split-value system and its implementation in Chapter 3, present performance
optimizations to the system in Chapter 4, discuss opportunities for future work in Chapter
5, and conclude in Chapter 6.
18
Chapter 2
Electronic Voting Systems and
Technologies
We now turn to an overview of existing electronic voting systems and technologies, focusing
on systems that provide end-to-end verifiability. First, we discuss common design patterns
found in end-to-end verifiable voting systems. Then, we present an overview of homomorphic
encryption based designs and mixnets, two methods for achieving privacy with end-to-end
verifiability.
2.1
Design of End-to-End Voting Systems
Recall from section 1.3 that an end-to-end verifiable voting system is defined by three characteristics: verifiably cast-as-intended, verifiably recorded-as-cast, and verifiably tallied-asrecorded. The first two conditions must be verified by the voter herself; our desire for
incoercibility requires that no third-party may verify these properties on behalf of the voter.
The last condition may be verified by anyone in the public, including someone who did not
participate in the election. By verifying all three conditions, any voter may verify that her
vote was counted properly and that the declared election outcome is honest.
A typical election consists of two phases.
1. The voting phase. In the voting phase, a voter indicates her choices for a given election
by casting a ballot. The voter’s ballot is typically encrypted, and the system must
provide a proof to the voter that the encryption of the ballot faithfully represents her
19
choices. For the purposes of this discussion, we will assume that the electronic ballot
is accompanied by a paper trail and the voter is given a receipt that she may take out
of the polling site. We will also assume that privacy is guaranteed inside the voting
booth; even with the cooperation of the voter, no one else may observe the voter’s
actions inside the voting booth.
2. The tallying phase. In the tallying phase, the election officials tally the set of encrypted ballots to produce the election result. We require that they do so without
compromising the privacy of any individual voter; otherwise, a corrupt election official
can act as a coercer. We also require that the election officials provide a proof that
the announced election outcome is honest.
In this section, we examine designs that are used to ensure each of these conditions of
an end-to-end verifiable voting system is met.
2.1.1
A Secure Bulletin Board
Central to the design of end-to-end verifiable voting systems is a publicly viewable secure
bulletin board. There may be additional restrictions imposed on the actions that may be
taken on it; for example, the bulletin board may be append-only.
In general, encrypted votes may be posted to the secure bulletin board along with identifiable information about a voter as long as the ciphertext provides no information about
the underlying plaintext to anyone, including the voter herself. This means, for example,
that the voter cannot know the randomization values used in the encryption; otherwise, a
voter could prove her vote to a coercer.
The secure bulletin board also contains a proof of the election outcome, thereby allowing
anyone to verify it. This can be accomplished by listing the plaintext votes after removing
the associations between the plaintexts and voter IDs or by posting a zero-knowledge proof
of correctness for the decryption of the tallied votes.
2.1.2
Cast-as-Intended Behavior
To ensure cast-as-intended behavior, we use the “cast or challenge” protocol as described
in [25, 4]. After a voter has made her selections, the system encrypts them using some
20
randomness and prints a receipt containing a commitment to the ciphertext (e.g. a cryptographic hash of the ciphertext). The system may additionally print a paper copy of the
ballot containing more information which the voter must deposit into the ballot box before
leaving. The voter then has the option to either cast the ballot or challenge the system.
∙ The voter chooses to cast her ballot. She deposits the printout into the ballot box.
∙ The voter chooses to challenge the system. In this case, the system reveals the ciphertext and the randomization values used to produce it. The voter can verify that
encrypting her vote with the given randomization produces the desired ciphertext and
that the commitment to the ciphertext (e.g. hash value) on the printout is correct.
Should the voter choose this option, the ballot is spoiled. She must ask the system for
a new randomized encryption of her selection, with which she has the option to cast
or challenge again.
The system must output a commitment to some ciphertext before it knows whether the
voter will choose to cast or challenge, and a skeptical voter may increase the probability of
catching a malicious system by challenging many times. Due to our privacy assumptions
about the polling site, no one else may observe the voter while she issues a challenge to the
system. However, a ballot that the voter challenges must be spoiled because the challenge
process reveals how the ciphertext is constructed; a voter could use this information to prove
how she voted after leaving the polling site if the encrypted votes are publicly posted.
2.1.3
Recorded-as-Cast Behavior
To ensure recorded-as-cast behavior, we provide the voter with a way to check that her
vote is accurately recorded on the secure bulletin board. Using her paper receipt, the voter
can find her voter ID on the secure bulletin board and verify that the hash value of the
posted ciphertext corresponds to the hash value on her receipt. Combined with the cast or
challenge protocol of the previous section, the voter can verify that her selection has been
correctly recorded on the secure bulletin board. However, she cannot prove to anyone else
how she voted with just the ciphertext and corresponding hash value.
21
2.1.4
Tallied-as-Recorded Behavior
To ensure tallied-as-recorded behavior, the system provides a proof that the encrypted votes
recorded on the secure bulletin board correspond to the election outcome. In general, there
are two ways to accomplish this: either the ciphertexts are combined to produce the election
tally using homormorphic encryption, or the ciphertexts are shuffled and then decrypted so
they cannot be traced back to the voters that produced them. Homomorphic encryption
allows us to tally the votes without having to decrypt any individual vote, thereby maintaining voter privacy. Mixnets shuffle the input list of votes so that the output list cannot be
traced back to the inputs; the output list may then be decrypted and the plaintexts posted
on the secure bulletin board so that anyone may verify the election tally. Sections 2.2 and
2.3 describe each scheme in greater detail.
2.1.5
Threshold Encryption
In general, it is advisable to use threshold encryption, in which a minimum threshold 𝑘 out
of a possible 𝑛 trustees must agree before decryption can occur. This mitigates the risk that
some subset of election officials is corrupt and prevents them from decrypting individual
votes.
One way to share some secret value (such as a secret key) among many parties is to use
the method developed by Shamir [35]. To share a secret among 𝑛 trustees with a minimum
threshold 𝑘, we use a polynomial 𝑃 of degree 𝑘 − 1 over a finite field. The secret value is
set as 𝑃 (0); each trustee is given a point (𝑥, 𝑦) for 𝑥 ̸= 0. Because 𝑘 points are required to
reconstruct a polynomial of degree 𝑘 − 1, any 𝑘 trustees will be able to recover the secret
key, but fewer trustees will not be able to.
With some encryption schemes, threshold encryption may be enabled even without the
use of Shamir secret sharing; for example, [28, 16] provides a way to do this with the El
Gamal public-key encryption scheme.
2.2
Homomorphic Encryption Schemes
In a homomorphic encryption scheme, a set of operations may be performed on ciphertexts
to produce the encrypted result of a (possibly different) set of operations on the underlying
plaintexts without the need for decryption. The usefulness of homomorphic encryption is
22
immediately apparent in a cryptographic voting system: ciphertexts corresponding to the
selections of individual voters may be tallied to yield an encryption of the election outcome
without requiring that the individual votes be decrypted.
2.2.1
Exponential El Gamal
The exponential El Gamal public-key encryption [15] is one example of a cryptosystem that
supports homomorphic encryption.
Definition 2.2.1. (Exponential El Gamal)
∙ Key Generation. Given a group 𝐺 with order 𝑞 and generator 𝑔, which are public
𝑅
parameters, choose the secret key 𝑘 ←
− {1, . . . , 𝑞 − 1} and the public key 𝑦 = 𝑔 𝑘 .
𝑅
∙ Encryption. For each plaintext message 𝑚, select 𝑟 ←
− {1, . . . , 𝑞 − 1} and construct
the ciphertext (𝑐1 , 𝑐2 ) = (𝑔 𝑟 , 𝑔 𝑚 𝑦 𝑟 ).
∙ Decryption. Given a ciphertext (𝑐1 , 𝑐2 ), calculate
)︁
(︁
)︁
(︁
log𝑔 (𝑐𝑘1 )−1 𝑐2 = log𝑔 𝑔 −𝑘𝑟 𝑔 𝑚 𝑔 𝑘𝑟 = 𝑚
∙ Homomorphic Addition of Ciphertexts. Given two ciphertexts 𝐶1 = (𝑔 𝑟1 , 𝑔 𝑚1 𝑦 𝑟1 )
and 𝐶2 = (𝑔 𝑟2 , 𝑔 𝑚2 𝑦 𝑟2 ), componentwise multiplication yields an encryption of the sum
of their underlying plaintexts with randomness 𝑟1 + 𝑟2 .
𝐶1 × 𝐶2 = (𝑔 𝑟1 , 𝑔 𝑚1 𝑦 𝑟1 ) × (𝑔 𝑟2 , 𝑔 𝑚2 𝑦 𝑟2 )
= (𝑔 𝑟1 +𝑟2 , 𝑔 𝑚1 +𝑚2 𝑦 𝑟1 +𝑟2 )
Tallying and Proof of Decryption
After the votes have been tallied via homomorphic addition, the resulting ciphertext is
decrypted to yield the sum of all the votes. This decrypted sum is then posted to the secure
bulletin board and should be consistent with the declared election outcome. Finally, the
system needs to prove that the posted decryption is a proper decryption of the ciphertext
without revealing the secret key; otherwise, a voter could use the revealed secret key to
prove to a coercer how she voted.
23
In exponential El Gamal, given public parameter 𝑔 and public key 𝑦, we want to show
that the revealed plaintext 𝑚 is the proper decryption of the ciphertext (𝑐1 , 𝑐2 ). Recall that
a proper encryption of the plaintext 𝑚 results in ciphertext (𝑔 𝑟 , 𝑔 𝑚 𝑦 𝑟 ). Thus, a proof of
correct decryption is equivalent to proving that log𝑔 (𝑐1 ) = log𝑦 (𝑐2 /𝑔 𝑚 ), which can be done
using the scheme described by Chaum and Pedersen [10].
2.2.2
Existing Systems Using Homomorphic Encryption
Many existing end-to-end verifiable voting systems use homomorphic encryption because
its implementation is rather straightforward and the cryptographic primitives used are wellsupported. Despite starting with mixnets, the Helios system [2], which runs in a web browser
and is intended for online elections where the threat of coercion is low (e.g. school or club
elections), now uses exponential El Gamal. Design proposals for the STAR-Vote system in
Austin County, TX [3] use very similar techniques as those discussed here, including the castor-challenge protocol, encrypted votes posted on a publicly-viewable secure bulletin board,
and homomorphic tallying with exponential El Gamal. In these systems, the weaknesses of
using homomorphic tallying, including the computational cost of modular exponentiation
used in El Gamal and the difficulty of supporting write-in candidates, are offset by the
simpplicity and clarity of the design and implementation.
2.3
Mixnets
Another way of verifying that the election result accurately reflects the set of encrypted
votes is to use a mixnet. First described by Chaum [7] as a method to ensure anonymous
communication, mixnets secretly permute the list of inputs so that they cannot be associated
with the list of outputs. A mixnet may involve many mixservers that are each a part of the
overall chain; each mixserver secretly permutes the output of the previous mixserver, and
sends its output to the next mixserver to be permuted.
In the context of elections, the input to the mixnet is the list of encrypted votes, and the
output is a list of the same votes, encrypted differently and in some different order. We can
decrypt the ciphertexts that are output from the mixnet and post the resulting plaintexts
to the secure bulletin board along with proofs of correct decryption. This allows anyone
to check that the plaintext votes produce the declared election outcome while making it
24
impossible to associate each any plaintext vote with a voter.
The encryption of the votes must change between the input and output; we cannot
simply shuffle the encrypted values. Otherwise an adversary could easily compare the input
and output lists, find identical ciphertexts, and associate them with the voter IDs which are
posted alongside the input ciphertexts on the secure bulletin board.
2.3.1
Decryption and Reencryption Mixnets
There are two general types of mixnets which accomplish our goals: decryption mixnets and
reencryption mixnets.
∙ In a decryption mixnet, a message is encrypted with the public key of each mixserver in
the reverse order of traversal. In a system with 𝑛 mixservers, the input to the mixnet
for message 𝑚 is Enc𝐾𝑛 (Enc𝐾𝑛−1 (. . . (Enc𝐾1 (𝑚)))), where Enc𝐾𝑖 denotes encryption
using the public key of the 𝑖th mixserver. At each mixserver, one layer of encryption
is removed and the list of messages is shuffled before being sent to the next mixserver.
The mixnet outputs a list of decrypted messages whose ordering cannot be traced back
to the input ordering. Decryption mixnets are commonly used to anonymize routing in
designs known as “onion routing”, where each message is encrypted repeatedly like the
layers of an onion, and each mixserver “peels off” a layer until the plaintext message
is reached.
∙ In a reencryption mixnet, a message is encrypted once and input into the mixnet.
At each mixserver, messages are reencrypted using a public-key encryption scheme
that supports reencryption and then shuffled. Exponential El Gamal, as discussed
in section 2.2.1, supports reencryption by homomorphically combining the message
to be reencrypted with an encryption of the “zero message”. This similarly achieves
our goals: the underlying plaintexts do not change through the mixnet, but their
ciphertext representations and ordering are changed in a way that makes the outputs
untraceable to the inputs.
2.3.2
Verifiable Mixnets
A malicious mixserver might ignore the inputs given to it and send whatever list of encrypted
votes it desires to the next mixserver in the chain. If the malicious mixserver were the last
25
mixserver in the chain, it could ignore all of the ballots, decide the election outcome, and
construct an output list that matches that outcome.
Thus, in addition to preserving anonymity, we require that our mixnets are verifiable. In
the early mixnets, individuals who sent messages through the mixnet could verify that their
messages were correctly reproduced at the output end, but an outside observer could not
verify the correct operation of the mixnet. Later, the concept of universal verifiability was
created, leading to the requirement that mixnets provide proofs of correct operation that
any observer can verify. In this section, we present a brief survey of mixnet designs that
lead to the mixnet used in the split-value voting system. This survey is influenced by the
work done by Adida [1], which contains a more thorough analysis.
2.3.3
Early Mixnets: Individual Verifiability
The first mixnet proposed by Chaum [7] was a decryption mixnet. The sender prepares an
input by encrypting her message with the public keys of each mixserver in reverse order of
traversal, padding each ciphertext with some randomness before encrypting with the next
public key. Each mixserver decrypts one layer of the input, reorders the messages, and
passes the messages onto the next mixserver. The sender, knowing the randomness values
that were used, can verify that her message was output correctly.
One drawback to Chaum’s design is that the input size is proportional to the number of
mixservers, since each layer of encryption requires padding with randomness. The mixnet
proposed by Park et al. [27] improves upon this by using partial decryption and reencryption
using El Gamal. Denoting the secret key of mixserver 𝑖 to be 𝑘𝑖 and the corresponding public
key 𝑦𝑖 = 𝑔 𝑘𝑖 , the input to the mixnet is the message encrypted using the product of the
public keys of each mixserver.
𝑦=
∏︁
𝑦𝑖 = 𝑔
∑︀
𝑖
𝑘𝑖
𝑖
At mixserver 𝑚𝑖 , given ciphertext 𝑐 = (𝑐1 , 𝑐2 ) = (𝑔 𝑟 , 𝑚 · 𝑦 𝑟 ) (note that this does not
require the homomorphic variant of El Gamal where the second ciphertext component is
𝑔 𝑚 𝑦 𝑟 ), the message is partially decrypted by removing the contribution of that mixserver’s
public key from the joint public key.
𝑖
PartialDec𝑖 (𝑐) = (𝑐1 , 𝑐2 · 𝑐−𝑘
1 )
26
The resulting ciphertext is then reencrypted using standard El Gamal reencryption with
additional randomness 𝑟′ .
′
′
′
′
ReEnc(𝑐) = (𝑐1 · 𝑔 𝑟 , 𝑐2 · 𝑦 𝑟 ) = (𝑔 𝑟+𝑟 , 𝑚 · 𝑦 𝑟+𝑟 )
Vulnerabilities were discovered in both early mixnet designs; Pfitzmann [31, 30] showed
that related inputs produced related outputs in the mixnet; by choosing a malicious input
that is related to another non-malicious input, an attacker could trace both inputs through
the mixnet and discover the identity of the sender of the non-malicious message.
2.3.4
Sako and Kilian: Universal Verifiability
Definition 2.3.1. A universally verifiable mixnet is one that provides a proof of correct
operation that any observer, including one that did not provide an input to the mixnet, may
verify.
Using a universally verifiable mixnet in an electronic voting system enables any member
of the public to verify that the encrypted votes posted on the secure bulletin board were
honestly mixed prior to tallying, an important component of an end-to-end verifiable voting
system.
The first universally verifiable mixnet was proposed by Sako and Kilian in [33] by borrowing the design of the partial decryption and reencryption mixnet of [27] and adding a
proof of correct partial decryption and a proof of correct reencryption and shuffling which
is constructed by each mixserver.
Proof of correct partial decryption
We prove that partial decryption was done correctly at each mixserver by constructing a
proof of equality of discrete logarithms. Given mixserver 𝑚𝑖 with public key 𝑦𝑖 = 𝑔 𝑘𝑖 ,
ciphertext 𝑐 = (𝑐1 , 𝑐2 ) and PartialDec(𝑐) = (𝑐′1 , 𝑐′2 ), we compute 𝑐2 /𝑐′2 and construct a proof
that
log𝑔 (𝑦𝑖 ) = log𝑐1 (𝑐2 /𝑐′2 )
using a scheme such as that described in [10].
27
Proof of correct reencryption and shuffling
To prove that it shuffled and reencrypted messages correctly, each mixserver constructs a
second shuffle and reencryption. With 50% probability, the verifier asks the mixserver to
prove that the second shuffle and reencryption was done correctly by revealing the permutation and randomization values used. The other half of the time, the verifier asks the
mixserver to construct and reveal a permutation and set of randomization values to go between the primary and secondary shuffles. Thus, a dishonest mixserver will be caught with
50% probability. This process may be made non-interactive via the Fiat-Shamir heuristic
[12].
2.3.5
Jakobsson, Juels, and Rivest: Randomized Partial Checking
In the randomized partial checking (RPC) design proposed by Jakobsson, Juels, and Rivest
in [19], each mixserver publishes a commitment to the permutation and randomization values
used (e.g. by using a hash function). Then, the verifier constructs a challenge by choosing
half of the inputs, for which the mixserver must reveal the permutation and randomization
values. The verifier checks that the output of the mixserver is consistent with the mixserver’s
response to the challenge and that the permutation and randomization values are consistent
with the published commitments. Because each input to a mixserver has probability 1/2
of being chosen, the probability that a single mixserver can act dishonestly about 𝑛 votes
diminishes as 2−𝑛 .
Recent work by Khazaei and Wikström [22] demonstrated practical attacks against decryption and homomorphic mixnets with RPC that break privacy and correctness. These
attacks have been found (and later fixed) in systems such as Scantegrity [9]. Khazaei and
Wikström recommend that RPC is not used with homomorphic mixnets and used with
decryption mixnets only after more rigorous proofs of security are developed.
2.4
Conclusion
In this section, we surveyed the many technologies that have been developed for end-to-end
verifiable electronic voting systems. We presented the methods used for ensuring verifiable
cast-as-intended, recorded-as-cast, and tallied-as-recorded behaviors, including the “cast or
challenge” protocol, homomorphic tallying with exponential El Gamal, and universally verifi28
able mixnets. In the next chapter, we present the split-value representation and commitment
scheme and show how it may be used in conjunction with a mixnet to achieve end-to-end
verifiablity.
29
30
Chapter 3
End-to-End Verifiable Voting Using
Split-Value Representations
In this section, we present the split-value system enabling end-to-end verifiable voting as
described by Rabin and Rivest [32] as well as an overview of the implementation as given in
[29]. We will focus on the main ideas and components of the design and implementation of
the system, and leave the explanation of the details to the respective papers.
3.1
3.1.1
The Split-Value Voting System Design
Definitions
Given an election, choose the representation modulo 𝑀 so that any voter’s selection, including possible write-in candidates, may be represented by an integer modulo 𝑀 .
Definition 3.1.1. The split-value representation of 𝑥 modulo 𝑀 is the pair SV(𝑥) = (𝑢, 𝑣)
such that 𝑥 = 𝑢 + 𝑣 mod 𝑀 .
Note that for a given 𝑥, knowing the value of 𝑢 or 𝑣, but not both, reveals no information
about 𝑥. This key property allows us to construct zero-knowledge proofs of equality for two
values using their split-value representations without revealing the actual values themselves.
Definition 3.1.2. A commitment to a value 𝑥 with randomness 𝑟 is the value 𝑐 = Com(𝑥, 𝑟)
such that 𝑐 may be “opened” to reveal 𝑥 and 𝑟. We desire that the Com function is both
hiding and binding. That is, given 𝑐 = Com(𝑥, 𝑟), no information is gained about 𝑥, and it
is infeasible to produce another (𝑥′ , 𝑟′ ) such that 𝑐 = Com(𝑥′ , 𝑟′ ).
31
The HMAC function, with the randomness 𝑟 as the key, is used as the commitment
function in our implementation.
Definition 3.1.3. Given SV(𝑥) = (𝑢, 𝑣) and randomness 𝑟 and 𝑠, the split-value commitment of 𝑥 is ComSV(𝑥) = (Com(𝑢, 𝑟), Com(𝑣, 𝑠)).
Using split-value commitments, we may construct probabilistic proofs that two values
are equal without revealing the value itself. Suppose that a prover wishes to prove that
𝑥 = 𝑥′ . Given their split value commitments
ComSV(𝑥) = (Com(𝑢, 𝑟), Com(𝑣, 𝑠))
ComSV(𝑥′ ) = (Com(𝑢′ , 𝑟′ ), Com(𝑣 ′ , 𝑠′ ))
the prover claims that there exists a value 𝑡 such that 𝑡 = 𝑢 − 𝑢′ and 𝑡 = 𝑣 ′ − 𝑣. This
can be true if and only if 𝑥 = 𝑥′ . The prover reveals the value of 𝑡 to the examiner. With
probability 1/2, the examiner asks for the opening of the “left” commitments, in which case
the prover reveals the values 𝑢, 𝑟, 𝑢′ , and 𝑟′ . The verifier checks that the values Com(𝑢, 𝑟)
and Com(𝑢′ , 𝑟′ ) are correct and that 𝑡 = 𝑢 − 𝑢′ . With probability 1/2, the examiner asks
for the “right” commitments, in which case the prover reveals the other components of the
split-value commitment. Thus, if 𝑥 ̸= 𝑥′ , a dishonest prover can win with probability at
most 1/2.
3.1.2
The Verifiable Mixnet
The main component of the split-value system is a verifiable mixnet composed of a 3 × 3
grid of mixservers, as shown in Figure 3-1. These servers are responsible for receiving votes,
obfuscating and permuting them, and producing an output list that cannot be traced back
to the input list. Denote the mixserver in row 𝑖 and column 𝑗 as 𝑃𝑖,𝑗 .
When the voter’s vote 𝑤 is cast on her device (e.g. a tablet), it is broken into three
components 𝑥, 𝑦, and 𝑧 such that 𝑤 = 𝑥 + 𝑦 + 𝑧 mod 𝑀 . The device then creates split-value
representations of each component. For each component, the split-value representation and
its commitment are sent over a secure channel to the first column mixserver of the row
corresponding to the component. For example, 𝑥 and ComSV(𝑥) are sent to 𝑃1,1 , 𝑦 and
ComSV(𝑦) are sent to 𝑃2,1 , and so on. Note that because each mixserver only receives one
32
3 × 3 server array
n voters
P1,1
P1,2
P1,3
P2,1
P2,2
P2,3
P3,1
P3,2
P3,3
2m lists/server
n
Figure 3-1: The mixnet consists of a 3 × 3 grid of mixservers. Votes are split into three
components and each component is sent to one of the leftmost servers. Each column of
mixservers agree upon a shuffling and obfuscation; the shuffled and obfuscated values are
then sent to the next column. The process is repeated 2𝑚 times.
out of three components of the vote, no single server may reconstruct any individual vote
(and break voter privacy) without leaked information from at least two other mixservers.
Definition 3.1.4. A triplet 𝑆 ′ = (𝑥′ , 𝑦 ′ , 𝑧 ′ ) is an obfuscation of 𝑆 = (𝑥, 𝑦, 𝑧) if 𝑥′ + 𝑦 ′ + 𝑧 ′ =
𝑥 + 𝑦 + 𝑧 mod 𝑀 .
The first column of servers agree upon a triplet of values (𝑝, 𝑞, 𝑟) such that 𝑝 + 𝑞 + 𝑟 =
0 mod 𝑀 for each vote. The first row then computes 𝑥′ = 𝑥 + 𝑝 mod 𝑀 , the second row
computes 𝑦 ′ = 𝑦 + 𝑞 mod 𝑀 , and so on. The triple
(𝑥′ , 𝑦 ′ , 𝑧 ′ ) is then an obfuscation of
1
(𝑥, 𝑦, 𝑧). The column of servers then agree upon a random permutation 𝜋 : {1, . . . , 𝑛} →
{1, . . . , 𝑛}. Each server transmits the permuted obfuscated values to the next column in the
same row: 𝑃1,1 transmits 𝑥′𝜋(1) , . . . , 𝑥′𝜋(𝑛) to 𝑃1,2 , 𝑃2,1 transmits the permuted 𝑦 ′ values to
𝑃2,2 , and so on.
The second and third columns of mixservers repeat this process, and the results of the
final obfuscation and permutation are posted to the secure bulletin board. Given a security
parameter 𝑚, this process of obfuscation and permutation across each column of the mixnet
is performed 2𝑚 times. In our current implementation, 2𝑚 = 24.
33
To verify the election result, we divide the 2𝑚 repetitions into two groups of 𝑚. The
first 𝑚 repetitions are used to show that the mixnet does not alter the votes. To do so,
we reveal the permutations used at each column. For each split-value commitment on the
input side of the mixnet, we follow the revealed permutations to arrive at its corresponding
split-value commitment on the output side and construct a proof of equality as described
in the previous section. These permutations and proofs of equality are posted to the secure
bulletin board for anyone to examine.
The other 𝑚 repetitions are used to prove that the election outcome is correct. All of
the commitments to the votes are opened, revealing the votes themselves. These votes are
posted to the secure bulletin board along with the election outcome for anyone to examine.
3.2
Implementation Details
The original design of the system included a functioning preliminary prototype written
using Python 3.4. In this prototype, mixservers were simulated using Python objects. A
simulation of a complete election, including vote generation, mixing, proof construction, and
verification, was contained within a single Python process. Python was chosen for its ease
of programming, debugging, profiling, and installation. Care was taken to ensure that the
number of included libraries was kept to a minimum, so that the code is easy to run and
the results are easily replicated.
3.2.1
Server Processes and Communication
Subsequent work [29] significantly decoupled the system. Each of the mixservers, secure
bulletin board, and vote generation simulator each operates within its own process. An
interface was built on top of Python TCP socket servers to facilitate communication, and
Python socket handlers were used to handle incoming requests. All data was serialized with
standard JSON libraries. The implementation also introduced a controller server, whose
responsibilities include the following.
∙ Serve as the first point of contact for all other servers. All other servers start knowing
the IP address and port of the controller.
∙ Keep track of and broadcast network information, including all other servers, their
addresses, and roles, to the network.
34
∙ Assign roles to each server, and assign their positions in the mixnet.
∙ Synchronize the various phases of the election simulation, including vote production,
mixing, proof construction, tallying, and verification.
Because all communication occurs via standard TCP sockets, the implementation may
be used on any group of computers running on a local area network. The implementation
has been tested on single-core and multicore machines as well as multiple virtual machines
running on a virtualized network.
3.2.2
The Implementation of Randomness
One particular detail that deserves special attention is the implementation of randomness.
Randomness is used throughout the system: in constructing split-value representations, in
constructing commitments, in choosing which component of a split-value representation to
reveal during a verification, and so on. Fast, cryptographically secure randomness is essential
to the correct functioning of the system.
The utility library written for this implementation includes a function that generates
pseudorandom bytes. Each server maintains a list of randomness sources. When a randomness source is created, its name is used as the initial seed. When a sequence of pseudorandom
bytes is desired, the seed is hashed twice using the SHA-256 hash function. The first hash
determines the updated seed, which is stored in the place of the original seed. The second
hash is of the original seed with a “tweak” string appended. With a good hash function,
these two hashes will not appear to have originated from the same seed.
During the proof construction phase, we require a good source of randomness to choose
the 𝑚 repetitions that will be used to verify the mixnet and the other 𝑚 repetitions that
will be used to verify the election result. There are two options for this.
∙ An external source of randomness, such as dice rolling at a public ceremony after voting
has ended. The benefit to this method is that we obtain “true” randomness that no
adversary can predict beforehand. However, it is possible that election officials and
the public are uncomfortable with introducing what appears to be games of chance
into the election process.
∙ A Fiat-Shamir style hash [12] of the secure bulletin board. With overwhelming probability, a Fiat-Shamir style hash of the secure bulletin board will provide randomness
35
that is good enough to overcome an adversary. There is the small but nonzero chance
that an adversary could manipulate the election result by providing both a faulty proof
that supports the manipulated outcome and an alternate version of the secure bulletin
board that, when hashed, produces randomness that verifies her faulty proof. To compensate for this, we set the number of repetitions (𝑚) to be higher so that the chance
of producing an alternate secure bulletin board that passes all of the verifications
decreases exponentially.
In the current implementation, we use the Fiat-Shamir method. However, this choice
incurs a penalty in performance, as our setting for 𝑚 could be reduced dramatically by
using an external source of randomness, resulting in fewer iterations through the mixnet
and a shorter proof. In the next chapter, we will see that the use of an external source of
randomness may provide enough of a performance boost to meet certain goals that election
officials desire.
3.3
Conclusion
In this section, we introduced the concepts and design behind the split-value verifiable voting
system, including the idea of a split-value commitment. We discussed the ways that votes
are obfuscated and permuted through the mixnet. Proofs are constructed to verify both the
mixnet and the election outcome without associating voters with their votes. Finally, we
examined details of the prototype implementation, including the communication interface
between servers and the implementation of randomness used in our simulations.
36
Chapter 4
Performance Optimization of the
Split-Value System
After verifying the correctness of the implementation, we turn to performance. The performance of an electronic voting system is an important consideration when selecting among
various options. The system must scale to handle hundreds of thousands or even millions
of voters per district and perhaps multiple simultaneous races. Furthermore, the media and
the public demand fast, accurate reporting of election results. We would like to be able to
provide the election results along with a verified proof within hours of the closing of polling
sites.
One proposed advantage of the split-value system is its efficiency. Existing comparable
systems such as STAR-Vote [3] use relatively time-consuming public key encryption operations such as modular exponentiation, while the split-value system uses much simpler and
faster modular addition operations. Compared to other systems, the split-value system pays
a higher cost due to the need for 2𝑚 repetitions. Furthermore, the cost of inter-process and
inter-machine communication and data serialization become non-negligible.
The implementation presented in [29], referred to as the reference implementation here,
contained numerous opportunities for optimization. In this section, we begin with a survey
of the performance of cryptographic primitives and data serializers used heavily in the implementation. Then we present the main optimizations that were successfully applied to the
reference implementation and conclude with an analysis of the scalability of the system.
All timings reported in this section, unless otherwise noted, are performed on a 16 core
37
Amazon Web Services (AWS) machine using two Intel E5-2666 v3 processors operating at
2.9 GHz and 32 GB RAM. The default settings for simulating an election consist of a single
race with two candidates and a third write-in candidate up to sixteen characters, 1000 voters,
and 2𝑚 = 24.
4.1
4.1.1
Performance of Cryptographic Primitives
Hashing and Encryption Functions
Function
Python os.urandom
Python hashlib.md5
Python hashlib.sha256
NaCl SHA-256
Cryptography SHA-256
Python HMAC with hashlib.sha256
Cryptography AES/CBC
XSalsa20
Runtime (s)
36.95
5.71
7.64
33.47
150.27
51.33
377.80
33.94
Table 4.1: Runtimes, in seconds, of ten million calls to various cryptographic functions in
Python built-in and third-party libraries.
Table 4.1 shows the performance of various cryptographic primitives, including hash functions (MD5, SHA-256), a block cipher (AES in CBC mode), and a stream cipher (XSalsa20),
by running each function ten million times. The built-in Python hash functions and XSalsa20
perform efficiently as expected, and we choose SHA-256 as the hash function used in the
implementation as it is both fast and secure.
It is surprising that the Python NaCl and Cryptography third-party libraries, both of
which are very popular, significantly underperform compared to the built-in functions. The
inefficiency of AES is particularly surprising given the Intel AES New Instructions Set, which
gives hardware support to all standard modes of operation and key lengths. The expanded
instruction set gives very good performance; encryption with 256-bit AES-CBC takes 5.65
CPU cycles per byte (256-bit AES-CBC is the most expensive operation) [18]. The CPUs
used in this benchmark did indeed support the expanded instruction set, but it is likely that
the third-party libraries did not use them.
Unfortunately, the original design estimated the performance of AES to be more than
two magnitudes faster than its Python implementation [32]. Even using the significantly
38
faster HMAC with SHA-256, commitments are more than forty times slower than estimated.
It is likely that C code written using the expanded hardware instructions will offer significant
performance benefits. For our Python implementation, we use HMAC with SHA-256 as the
commitment function.
4.1.2
Pseudorandom Number Generators
Function
Python os.urandom
sv.get_random_from_source
Runtime (s)
36.95
46.87
Table 4.2: Runtimes, in seconds, of ten million calls to os.urandom and the SHA-256-based
pseudorandom number generator.
As discussed in section 3.2.2, randomness is used throughout the system. The method
used in the implementation gives a good pseudorandom generator with three calls to the
SHA-256 hash function (once to generate the new seed, and twice to compute the tweaked
hash which becomes the output). However, as seen in Table 4.2, there is a non-negligible
overhead likely due to the repeated calls to the digest function and the byte-encoding
functions used to incorporate the tweaking string.
When benchmarking the two functions side by side, it appears that os.urandom, which
provides “a string of n random bytes suitable for cryptographic use” [13] is a faster option.
However, in the context of the overall system, the improvement is minimal. Furthermore,
os.urandom relies on the operating system’s implementation of the /dev/urandom entropy
pool, which is not always guaranteed to have enough entropy to produce good randomness.
For these reasons, we stick to the method of repeatedly hashing the seed string to generate
pseudorandomness.
4.2
Performance of JSON Serializers
The JSON format is used to serialize all data sent over the wire and posted to the secure
bulletin board; it was chosen because it is compact, very commonly used, and well-supported
by all platforms.
Table 4.3 shows the performance of three common JSON serialization libraries for Python,
tested on a moderately large (164.6 MB) JSON file. The simplejson and ujson libraries
39
Function
Python json.loads
simplejson.loads
ujson.loads
Python json.dumps
simplejson.dumps
ujson.dumps
Runtime (s)
33.60
32.34
26.79
31.08
38.91
7.76
Table 4.3: Runtimes, in seconds, of five calls to serializing (dumps) and deserializing (loads)
functions of three different Python JSON libraries on a 164.6 MB JSON file.
have C extensions that can be compiled for performance boosts; the extensions were included
in our tests.
The ujson library significantly outperforms the other two libraries, but we could not use
it because it does not support integers longer than 64 bits. From the results of our benchmarks, we expect similar runtimes using either of the other two libraries. However, other
tests [21] show that simplejson holds a significant advantage on calls to loads, especially
with lists of dictionaries. Ultimately, using simplejson yielded significant speedups over
the native Python JSON library in our prototype.
4.3
Optimization of the Prototype
To effectively optimize the reference implementation, we begin by collecting timing information on each phase of the simulation. We delve further into the performance of each
function call by using the Python profiler, which tells us how much time is spent in each
function, including and excluding the time spent in functions called from it. From there, we
generally prioritize optimizing the most time-consuming functions as well as examining the
most frequently called functions to determine if any unnecessary work is being done.
We focus on the timings of the mix, proof, and verification phases as they are the most
time-consuming. In a real election, the vote generation would occur throughout the day
rather than all at once, so it is not a bottleneck in the system. The setup occurs before the
election, and the tallying is very fast compared to the mix, proof, and verification phases.
Table 4.4 shows the major optimizations that were performed on the reference implementations and the resulting runtimes. The performance gains for each optimization are
reported relative to the row immediately preceding it in the table.
In summary, the most effective optimizations involved leveraging the available parallelism
40
Optimization
Reference implementation
Socket operations
Remove get_size in SBB operations
Byte/Integer conversion
Parallelization
Use simplejson library
Eliminate deepcopy
Precompute SBB hashes
Chain SBB hashes
Mix (s)
51.41
30.47
20.96
18.30
6.21
5.48
5.92
5.41
5.36
Proof (s)
142.62
99.10
84.56
80.73
59.27
17.79
17.34
5.63
4.96
Verify (s)
24.00
22.02
18.82
17.93
9.95
7.08
5.80
5.76
5.84
Total (s)
225.44
154.99
129.74
122.29
80.77
35.72
34.38
22.12
21.47
Table 4.4: Runtimes, in seconds, of the major phases of a simulated election at each stage
of the optimization process. All timings reflect the average of five simulations of an election
consisting of one race, two candiates plus one write-in, and 1000 voters. The Total Time
column includes the tallying time in addition to the runtime for the mix, proof, and verify
phases, giving the system’s total expected runtime after an election has closed.
in the multiple mixservers and ensuring that commonly used subroutines were as fast as
possible by optimizing by hand, using a built-in Python function, or selecting a third-party
library optimized for speed. Overall, we observe a 90% reduction in runtime due to our
optimizations detailed in this section.
In this section, we begin by introducing the Python profiler, CPython, and then describe
each successful optimization applied and report its resulting speedup.
4.3.1
The Python Profiler
The Python profiler, CProfile, allows us to see the amount of time spent in each function.
We sort the CProfile output by the cumulative time per call, which, for every function call,
gives the amount of time spent in that function and all functions called from it.
Figure 4-1 gives one example of a part of a CProfile output when the profiler was run on
the mixserver handler for the first phase of proof generation. The functions handle_prove,
compute_output_commitments, ask_sbb_to_post, and json_client are all high-level wrapper functions that call many functions within them. We notice that most of the profiler output is dominated by JSON encoding (which is used when sending data to the secure bulletin
board) and get_random_from_source, which in turn calls secure_hash and bytes2hex.
Note that the function calls to dumps, which handles the JSON encoding, and
get_random_from_source, which generates the randomness for the split-value commitments,
together consume 13.982 of the total 18.982 seconds in this mixserver handler. This suggests
41
37877535 function calls (27315331 primitive calls ) in 18.982 seconds
ncalls cumtime percall filename : lineno ( function )
1 18.982 18.982 split - value - voting - dev / ServerMix . py :144(
handle_prove )
1 10.178 10.178 split - value - voting - dev / sv_prover . py :19(
compute_output_commitments )
1
8.804
8.804 split - value - voting - dev / ServerGeneric . py :30(
ask_sbb_to_post )
1
8.804
8.804 split - value - voting - dev / ServerController . py :223(
json_client )
3
7.890
2.630 split - value - voting - dev / sv . py :790( dumps )
3
7.890
2.630 python3 .4/ json / __init__ . py :182( dumps )
3
7.862
2.621 python3 .4/ json / encoder . py :175( encode )
1920695
7.468
0.000 python3 .4/ json / encoder . py :404( _iterencode )
12482899
6.827
0.000 python3 .4/ json / encoder . py :325( _iterencode_dict )
1920645
6.125
0.000 python3 .4/ json / encoder . py :269( _iterencode_list )
144000
6.092
0.000 split - value - voting - dev / sv . py :195(
g e t_ ran dom _f rom _s our ce )
336000
5.812
0.000 split - value - voting - dev / sv . py :47( secure_hash )
144000
4.077
0.000 split - value - voting - dev / sv . py :86( bytes2hex )
Figure 4-1: CProfile output for a single call to the mixserver handler for the first phase of
the proof production phase, sorted by the amount of time spent in each function. We
note that JSON-related functions, get_random_from_source, and bytes2hex are timeconsuming function calls.
that, in the prototype code before any optimizations, our message passing and pseudorandom
number generation helper routines are taking up much more time than the cryptographic
computation; the HMAC function used for the split-value commitments do not even appear
in the top 13 lines of the profiler output. There is good reason to believe that the cryptography portion of the system’s design is relatively efficient as predicted, but other parts of
the implementation are responsible for the majority of the runtime.
4.3.2
Message Passing Over Sockets
As explained in the previous chapter, communication between machines happens via TCP
servers communicating over Python’s standard socket implementation, which provides an
interface for sending and receiving data [14].
∙ socket.recv(bufsize). Receive data from the socket up to bufsize bytes, and return
the data received.
∙ socket.send(string). Send data through the socket, and return the number of bytes
sent. socket.send does not guarantee that the all data is sent; the caller must check
42
the return value and retry sending the remaining data if necessary.
∙ socket.sendall(string). Send data through the socket, and return None on success.
Unlike socket.send, socket.sendall guarantees that all data is sent if the function
returns successfully.
Due to constraints in the operating system, the value for bufsize used in the socket.recv
function is usually 4096 or 8192, corresponding to the maximum length that may be sent
via a single call to socket.send. Note that there is not a corresponding socket.recvall
function. If a longer message is desired, we must repeatedly call socket.recv until the
entire message is received. The reference implementation handled this situation by sleeping
for a short amount of time after each call to socket.recv before attempting to receive more
data from the socket.
Because socket.recv blocks if the data is not ready yet (up to some timeout), we do not
need to sleep to wait for more data; rather, we just need to know how much data to expect
and continuously call socket.recv until that length is reached. We define the following
higher-level interface for communication over sockets:
∙ send(socket, message). Prepend message with its length and send the resulting
bytes over socket.
∙ recv(socket). Receive and return all data from the socket by stripping out the
length from the first chunk of the message and repeatedly calling socket.recv until
the length is met or an error is encountered. We assume that the length of a message
will never exceed the limit of a 32-bit signed integer.
This optimization yielded a speedup of 20.9 seconds (40.7% reduction in runtime) in the
mix phase and 43.5 seconds (30.5% reduction in runtime) in the proof phase.
4.3.3
Byte/Integer Utility Functions
Our profiler output revealed that a significant amount of time was spent on the reference
implementation’s utility functions for converting between Python bytes, integers, and hex
strings. The conversions are necessary because different components of the system expect
values to be in different forms; for example, the library used for cryptographic hash functions
43
returns a byte array, while our secure bulletin board, to which the results of computing these
cryptographic hashes are posted, accepts string inputs.
Specifically, for the sv.bytes2hex implementation, which returns the hex string representation of a byte array, we achieved a 15× speedup in the function’s runtime by simply
replacing the byte-by-byte conversion with the built-in binascii.hexlify function.
For the sv.int2bytes function, which converts any integer into a bytearray representation, we use Python 3’s new bit_length attribute, which supports signed integers up to
64 bits long, along with a 64-bit mask and shift. This allows us to process the input eight
bytes at a time instead of one, yielding a 1.5× speedup in the function’s runtime compared
to the reference implementation.
Overall, these optimizations produced a 5.75% reduction in runtime, mostly in the proof
and verify phases.
4.3.4
Introducing Parallelism
No performance analysis is complete without an analysis of the potential for parallelism.
Multicore CPUs are now the norm; even a typical laptop that we would expect an election
official to use is likely to have at least two CPU cores. Furthermore, in our system, the
computation required to mix the votes and construct a proof of the election outcome is
distributed across the nine mixservers. We show here that significant parallelism can be
leveraged in each of the mix, proof, and verification phases of the system.
Parallelism in the Mix and Proof Phases
In the reference implementation of the mix and proof phases, the controller server issued
remote procedure calls (RPCs) to each mixserver sequentially. Each RPC would block on
the controller until the corresponding mixserver’s RPC response arrived at the controller;
only then would the next RPC be issued. Because the RPCs are independent within each
part of the proof and verify phases, they may be issued asynchronously and in parallel.
We use Python’s implementation of processes in the multiprocessing library to accomplish this parallelization. For each RPC, one process is forked from the pricess running the
controller server. The forked processes are responsible for issuing the RPCs, waiting for
responses, and returning to the parent process. Even with a single-core controller server, a
process that is waiting for an RPC response may yield the CPU to allow another process to
44
issue an RPC.
Processes are used instead of threads because Python’s Global Interpreter Lock limits
parallelism between the various threads spawned from a single process. We confirmed that
using Python threads instead of processes yielded no noticeable speedup over the reference
implementation. The built-in multiprocessing library does not support process pools as it
does thread pools, so we must fork a new process for each RPC instead of relying on existing
workers. Fortunately, forking a new process is a constant-time cost relative to scaling the
number of voters and our tests show that it is negligible.
Introducing parallelism produced a 12.1 second speedup (66.1% reduction in runtime)
in the mix phase and 21.5 second speedup (26.6% reduction in runtime) in the proof phase.
Parallelism in the Verification Phase
Unlike the mix and proof phases which involve many distinct machines, the verification
phase occurs on a single machine. However, all parts of the verification process depend
only on the contents of the secure bulletin board; we expect that any member of the public
may take the contents of the secure bulletin board and verify the election outcome herself.
Although no part of the verification phase depends on any previous part because each part
may independently read the secure bulletin board, in practice it is easier to let earlier phases
populate a shared data structure for later phases to use. This prevents the need to deserialize
the secure bulletin board, which may be a large file and thus a costly process, at the cost of
some parallelism.
In the reference implementation, later parts of the verification phase used data that
earlier parts had already verified. This makes verification simpler because at no point does
the verifier need to worry about inconsistent or unverified data propagating through it.
However, this scheme resulted in a complex data dependency graph which made the verifier
difficult to parallelize; a process running one part of the verification could be started only
when all of its dependencies were completed.
The requirement that data consumed in later parts must itself be verified is stricter than
necessary. If any step of the verifier fails, we declare the election to be invalid, so even if bad
data propagates through the verifier, at some point that data will itself be checked and an
error raised. This check might not happen until after the bad data has been used in other
parts of the verifier, but the problem will be caught nonetheless.
45
Relaxing our requirements on the verifier and introducing parallelism resulted in a 8
second speedup (44.5% reduction in runtime) in the verification phase.
4.3.5
JSON Serialization
The results presented in [21] suggest that simplejson often outperforms Python’s native
JSON library, especially when deserializing lists of dictionaries. By simply switching the
serializer library and eliminating indentation and formatting from JSON strings, a 41.5
second speedup (70% reduction in runtime) in the proof phase and 2.9 second speedup
(28.8% reduction in runtime) in the verify phase were achieved. The overall runtime speedup
for the simulation was 45 seconds (55.8% reduction in runtime).
4.3.6
Hashing the Secure Bulletin Board
Hashes of the secure bulletin board are used in generating the cut-and-choose and left-right
challenges during the construction of the proof. The hash values are recomputed in the
verification phase to ensure that the values used in the proof are actually correct hashes of
the secure bulletin board.
Because the verification process itself makes modifications to the secure bulletin board
data structure, causing the hash values to differ, the reference implementation makes a
deep copy of the data structure before proceeding with verification. By moving the hashing
verification to be before the verification of the rest of the data structure, a 1.3 second speedup
(18.3% reduction in runtime) was obtained in the verification.
In the reference implementation, the mixservers ask the server storing the secure bulletin
board for its hash value for the aforementioned reasons. Although the mixservers operate
in parallel, the secure bulletin board runs on a single process and requests for the hash
value are serialized. Therefore, a mixserver that issues its request later must wait for the
secure bulletin board to service all earlier requests. In our optimization, the controller server
requests the hash value and sends it to each mixserver as part of the body of the relevant
requests. Each mixserver then has all the necessary information at the beginning of each
phase and wastes no time waiting for the single-threaded secure bulletin board to service
other requests. This optimization resulted in a 11.7 second speedup (67.5% reduction in
runtime) in the proof phase.
Each request for the hash value of the secure bulletin board requires that the contents
46
of data structures held in memory be serialized before they can be hashed. As the secure
bulletin board grows, this serialization becomes expensive. We can reduce this cost by
computing the updated hash value every time anything is appended to the bulletin board.
We calculate the new hash value by concatenating the data to be appended with the previous
hash value. Because the previous hash value is fixed in size, the cost to compute the hash
does not grow as the size of the secure bulletin board. While this results in a relatively small
performance gain in the proof phase (11.9% reduction in runtime), it prevents performance
from diminishing as the size of the secure bulletin board increases.
4.4
Performance Scalability
In the previous section, we simulated elections with 1000 voters because it provided useful
data for optimization without taking too long to run simulations. A real election, however,
may have hundreds of thousands or even millions of voters. We are interested in how our
system scales with the number of voters.
Number of Voters
100
500
1000
5000
10000
Total Time (s)
2.48
10.93
21.47
142.71
271.88
Time Per Voter (s)
0.0248
0.0219
0.0215
0.0285
0.0212
Table 4.5: Runtimes, in seconds, of the major phases of a simulated election (including mix,
proof, tally, and verify) for varying numbers of voters. All timings reflect the average of five
simulations. We observe that the runtime scales linearly as the number of voters.
Table 4.5 shows the performance of the system with 100, 500, 1000, 5000, and 10,000
voters and the corresponding costs per voter. We see that the performance of the system
scales roughly linearly as the number of voters, which bodes well for scaling the system to
production. In the current prototype and the testing machine, connections begin to time
out for 100,000 voters. This is likely an issue with the pool of TCP connections becoming
saturated since the simulation is running on a single operating system. However, this remains
to be tested rigorously.
Table 4.6 shows the performance of the system with the number of repetitions (2𝑚)
set to 4, 8, 16, and 24. Although the runtime grows slightly sublinearly as the number
of repetitions, reducing the number of repetitions is one way of significantly reducing the
47
Number of Repetitions (2𝑚)
4
8
16
24
Total Time (s)
4.01
7.84
15.17
21.47
Time Per Repetition (s)
1.0029
0.9801
0.9478
0.8947
Table 4.6: Runtimes, in seconds, of the major phases of a simulated election (including mix,
proof, tally, and verify) for varying numbers of repetitions (value of 2𝑚). All timings reflect
the average of five simulations.
system’s runtime.
4.5
Comparison with Other Systems
Unfortunately, there exists very little data concerning the performance of end-to-end verifiable electronic voting schemes in general and mixnets in particular. In the original design
of the Helios system by Adida, proof generation and verification of a 500 voter election
took many hours [2]. However, Adida notes the potential for using parallelism to achieve
speedups. Furthermore, we must keep in mind that computers have become much more powerful since the original design of Helios in 2008: as a rough metric, the cost per GFLOPS
(one billion floating-point operations per second) has decreased more than 600× in the last
seven years [38], while browser implementations of Javascript have continued to improve.
The original design of Scantegrity [9] claims that the “necessary Switchboard audit data”
could be produced and verified in a matter of minutes for an election with 32 races and
200,000 ballots. However, it is unclear if this is simply a proof of an honest Switchboard
in the election setup phase or if this time includes a proof and verification of the overall
election’s outcome.
Mixnet designs such as those proposed in [6] achieve improvements in the asymptotic
amount of work done in constructing a proof; however, it is unclear what the performance
of a prototype implementation would be.
4.6
Conclusion
In this section, we examined the performance of the split-value system implementation.
Benchmarking functions available for cryptographic primitives led to the usage of the SHA256 hash function and the HMAC commitment scheme in our implementation. We used the
48
Python profiler to identify areas of the implementation that could yield large performance
gains. By leveraging parallelism, reducing the wait time on sockets and requests to the
secure bulletin board, and using a more efficient JSON serializer, our optimized system runs
in less than one-tenth the time of the reference implementation.
With the optimized implementation, an election of one million voters will take between
seven and eight hours to mix, prove, tally, and verify. This is a bit more than twice as long
as the time generally allotted between the closing of polling sites and the release of election
results in the evening news. We hope that further optimization may yield this factor of two
to three improvement in the runtime.
Using properly hardware-optimized AES encryption as a commitment scheme has the
potential to reduce runtime significantly; according to Intel’s measurements, the estimate
of 8 million commitments per second given in [32] can be achieved even with conservative
estimates.
In the current implementation, considerable time is spent on repeating the mixing and
proof construction for 2𝑚 = 24 repetitions. As mentioned in section 3.2.2, this repetition
is necessary due to the reduced security in using a Fiat-Shamir hash of the secure bulletin
board instead of true randomness. The process may be sped up at least twofold if election
officials adopted a true source of randomness during the construction of the proof.
Overall, it appears that under ideal situations (access to hardware functions, minimal
cost in serialization and communication), the split-value system delivers on its promise
to provide security without the heavy computation used in most existing cryptographic
solutions. However, its implementation requires careful optimization and selection of thirdparty code.
49
50
Chapter 5
Future Work
The work of this thesis, building upon [29], establishes a baseline for the speed that can be
expected of a system built using the split-value design. Additional work may build upon the
optimizations described in this thesis to provide further performance gains or experiment
with using other languages and platforms to compare them to our Python implementation.
Work is also being done to improve the design by supporting additional fault tolerance.
Finally, a real-life deployed version of this system must also include mechanisms for secure
channels and key exchange as part of the election setup.
5.1
Performance Considerations
While the work in this thesis produced a 10× speedup over the reference prototype, there
is good potential for even more performance gains. For example, the results in section 4.1.1
suggest that our commitment scheme may become significantly faster by writing code with
native hardware support.
Given the current implementation, improvements to performance may be guided by the
following goals.
∙ Construct a more accurate model for where runtime is going. Studying the profiler
output for each mixserver is a good first step; even more informative would be a way
to visualize the amount of time spent by each mixserver in each parallelized mix or
proof phase. Another metric of interest is the duration of time that each server spends
waiting for other servers and the remote procedure calls that cause such waiting to
occur.
51
∙ Obtain accurate numbers on the amount of time spent in crypto-related operations,
versus data serialization and message passing. Throughout most of the optimization
process, the actual “number-crunching” of the crypto-related operations consumed far
less time than the overhead of data serialization and message passing. After numerous
optimizations, the two components appear relatively even in runtime. However, experiments that attempt to more accurately measure this (for example, by replacing a
crypto function with a fast, constant time statement like return 0) were inconclusive.
Accurate results here will provide valuable insight on the maximum benefit that can
be derived from optimizing each component of the system.
∙ Compare the performance of the split-value system to that of comparable systems. Detailed performance numbers are unfortunately not available in most of the electronic
voting and mixnet literature. However, it may be informative to profile existing implementations of systems such as [34, 2] and implement mixnet schemes designed for
efficiency such as [6] for comparison.
Python is often criticized for being a slow language; statically-checked, strongly-typed
languages compiled directly into machine code may perform much better and take advantage
of extensions such as CilkPlus for C/C++. On the other hand, the potential for performance
gains by using a language like C++ needs to be weighed against the increases in code
complexity and difficulty in reading and debugging. Python’s C bindings may also be used
for specific functions that could benefit from great optimization while leaving the majority
of the codebase intact.
5.2
Fault Tolerance
In a production environment, servers (which may be election officials’ laptops) may crash,
power may go out, and an election official might even forget to bring her laptop. Data may
be corrupted in transit or on disk. A server may be ejected from the mixnet if it is discovered
to be dishonest. A robust, fault-tolerant system is needed to ensure that the election can
still run smoothly despite these failures. Possible improvements to fault tolerance include
the following.
∙ Replicate and persist the secure bulletin board to disk. The secure bulletin board ul52
timately contains all of the information generated by the mixservers for the mix and
proof phases of the election. This information should be replicated and persisted to
disk so that no single server failure causes this information to be lost.
∙ Remove central points of failure, such as the controller. Because the controller dictates
the progression of the entire election and all other servers take instructions from it, a
failure of the controller will cause the entire system to stop. The controller may be
designed as a replicated state machine, or it may be removed altogether in a system
where the mixservers collectively communicate among each other and advance the
election.
∙ Ensure that no data is corrupted during transmission. This may be accomplished
via message authentication codes or digital signatures, which also provide security
guarantees as described in the next section.
Work by Danna Kelmer at NYU begins to address fault tolerance in the system.
5.3
Secure Channels and Key Exchange
Ultimately, the system must run on multiple machines connected to some network. Despite
precautions to ensure the integrity of the network (e.g. private network, wired connections only with wireless adapters off), cryptographically secure channels are a necessary
and relatively simple way to ensure end-to-end encryption and authentication. The various
public-key and symmetric encryption and authentication functions should be compared for
their security and performance characteristics, and the process of key exchange should be
designed to be done when the election is setup the day before.
53
54
Chapter 6
Conclusion
In this thesis, we present a new design for an end-to-end verifiable voting system by introducing the concept of a split-value representation and a split-value commitment. We also
introduce a verifiable mixnet design, which when used with split-value commitments, preserves the anonymity of voters. Compared to other designs which achieve the same result,
we expect that using relatively inexpensive split-value commitments in the mixnet rather
than other, more time-consuming operations such as repeated public-key encryption will
yield high performance while maintaining verifiability.
As with any candidate system for real-world application, the split-value voting system
must exhibit not only correctness, but also good performance, scalability, and fault tolerance.
Based on the performance optimizations and scalability analysis described in this thesis, the
potential for additional performance gains, and current work addressing fault tolerance, it
is our hope that the split-value voting system may evolve from a theoretically interesting
design to a viable implementation used in public elections over the coming years.
55
56
Bibliography
[1] B. Adida. Advances in Cryptographic Voting Systems. PhD dissertation, Massachusetts
Institute of Technology, Department of Electrical Engineering and Computer Science,
August 2006.
[2] B. Adida. Helios: Web-based Open-audit Voting. In Proceedings of the 17th Conference
on Security Symposium, 2008.
[3] S. Bell et al. STAR-Vote: A Secure, Transparent, Auditable, and Reliable Voting
System. In USENIX Journal of Election Technology and Systems, volume 1, 2013.
[4] J. Benaloh. Simple Verifiable Elections. In Proc. 2006 USENIX/Accurate Electronic
Voting Technology Workshop, 2006.
[5] J. Benaloh and D. Tuinstra. Receipt-free secret-ballot elections. In Proc. 26th ACM
Symposium on Theory of Computing, 1994.
[6] M. Chase, M. Kohlweiss, A. Lysyanskaya, and S. Meiklejohn. Verifiable Elections That
Scale for Free. In K. Kurosawa and G. Hanaoka, editors, Lecture Notes in Computer
Science, volume 7778. Springer, 2013.
[7] D. Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms.
Commun. ACM, 24(2), 1981.
[8] D. Chaum. Secret-ballot receipts. In Proc. 25th IEEE Symposium on Security and
Privacy, 2004.
[9] D. Chaum et al. Scantegrity: End-to-end voter-verifiable optical-scan voting. Security
& Privacy, IEEE, 6(3), 2008.
[10] D. Chaum and T. Pedersen. Wallet Databases with Observers. In Proceedings on
Advances in Cryptology - Crypto ’92, 1992.
[11] A. Feldman, J. Halderman, and E. Felten. Security Analysis of the Diebold AccuVoteTS Voting Machine. In Proc. 2007 USENIX/Accurate Electronic Voting Technology
Workshop, 2007.
[12] A. Fiat and A. Shamir. How to Prove Yourself: Practical Solutions to Identification
and Signature Problems. In Proceedings on Advances in Cryptology - Crypto ’86, 1987.
[13] The Python Software Foundation. Miscellaneous operating system interfaces. https:
//docs.python.org/3.4/library/os.html. [Online; accessed 18-May-2015].
57
[14] The Python Software Foundation. socket - Low-level networking interface. https:
//docs.python.org/3.4/library/socket.html. [Online; accessed 18-May-2015].
[15] T. El Gamal. A Public Key Cryptosystem and a Signature Scheme Based on Discrete
Logarithms. IEEE Trans. Inform. Theory, 31, 1985.
[16] R. Gennaro, S. Jarecki, H. Krawczyk, and T. Rabin. Secure distributed key generation
for discrete-log based cryptosystems. In J. Stern, editor, Lecture Notes in Computer
Science, volume 1592. Springer, 1999.
[17] Rop Gonggrijp et al. Nedap/Groenendaal ES3B voting computer: a security analysis,
2006.
R
[18] S. Gueron.
Intel○
Advanced Encryption Standard (AES) New Instructions Set. https://software.intel.com/sites/default/files/article/165683/
aes-wp-2012-09-22-v01.pdf, 2012. [Online; accessed 18-May-2015].
[19] M. Jakobsson, A. Juels, and R. Rivest. Making mix nets robust for electronic voting
by randomized partial checking. In D. Boneh, editor, USENIX Security Symposium.
USENIX, 2002.
[20] A. Juels, D. Catalano, and M. Jakobsson. Coercion-resistant electronic elections. In
Proc. ACM Workshop on Privacy in the Electronic Society, 2005.
[21] Jyotiska. JSON vs simpleJSON vs ultraJSON. http://blog.dataweave.in/post/
87589606893/json-vs-simplejson-vs-ultrajson. [Online; accessed 20-May-2015].
[22] S. Khazaei and D. Wikström. Randomized Partial Checking Revisited. In E. Dawson,
editor, Lecture Notes in Computer Science, volume 7779. Springer, 2013.
[23] R. Mercuri. Voting-machine risks. Commun. ACM, 35(11), 1992.
[24] L. Naish. Partial Disclosure of Votes in STV Elections. Voting Matters, 30, 2013.
[25] C. Andrew Neff. Practical high certainty intent verification for encrypted votes. Votehere, Inc. whitepaper, 2004.
[26] Ace Electoral Knowledge Network.
Results Management Systems.
aceproject.org/ace-en/pdf/vc/view. [Online; accessed 18-May-2015].
http://
[27] C. Park, K. Itoh, and K. Kurosawa. Efficient anonymous channel and all/nothing
election scheme. In T. Helleseth, editor, Lecture Notes in Computer Science, volume
765. Springer, 1994.
[28] T. Pedersen. A threshold cryptosystem without a trusted party (extended abstract).
In D. Davies, editor, Lecture Notes in Computer Science, volume 547. Springer, 1991.
[29] M. Pedroso. The Implementation of a Split-Value Verifiable Voting System. Master’s
thesis, Massachusetts Institute of Technology, Department of Electrical Engineering
and Computer Science, 2015.
[30] B. Pfitzmann. Breaking efficient anonymous channel. In A. De Santis, editor, Lecture
Notes in Computer Science, volume 950. Springer, 1995.
58
[31] B. Pfitzmann and A. Pfitzmann. How to break the direct rsa-implementation of mixes.
In J. Quisquater and J. Vandewalle, editors, Lecture Notes in Computer Science, volume
434. Springer, 1990.
[32] M. Rabin and R. Rivest. Efficient End to End Verifiable Electronic Voting Employing
Split Value Representations. In Proc. EVOTE 2014, 2014.
[33] K. Sako and J. Kilian. Receipt-free mix-type voting scheme - a practical solution to
the implementation of a voting booth. In L. Guillou and J. Quisquater, editors, Lecture
Notes in Computer Science, volume 921. Springer, 1995.
[34] D. Sandler, K. Derr, and D. Wallach. VoteBox: A tamper-evident, verifiable electronic
voting system. In Proceedings of the 17th USENIX Security Symposium, 2008.
[35] A. Shamir. How to share a secret. Commun. ACM, 22(11), 1979.
[36] Wikipedia. DRE voting machine. http://en.wikipedia.org/wiki/DRE_voting_
machine. [Online; accessed 18-May-2015].
[37] Wikipedia. Electronic voting. http://en.wikipedia.org/wiki/Electronic_voting.
[Online; accessed 18-May-2015].
[38] Wikipedia. FLOPS. http://en.wikipedia.org/wiki/FLOPS#Cost_of_computing.
[Online; accessed 18-May-2015].
[39] Wikipedia. Secret ballot. http://en.wikipedia.org/wiki/Secret_ballot. [Online;
accessed 18-May-2015].
[40] Wikipedia. United States presidential election in Florida, 2000. http://en.wikipedia.
org/wiki/United_States_presidential_election_in_Florida,_2000. [Online; accessed 18-May-2015].
59