Performance Optimization of a Split-Value Voting System by Charles Z. Liu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2015 c Massachusetts Institute of Technology 2015. All rights reserved. ○ Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science May 20, 2015 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Ronald L. Rivest Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Albert R. Meyer Chairman, Masters of Engineering Thesis Committee 2 Performance Optimization of a Split-Value Voting System by Charles Z. Liu Submitted to the Department of Electrical Engineering and Computer Science on May 20, 2015, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract As digital technology becomes more powerful and commonplace, the benefits of using computers to conduct elections become more apparent. In today’s elections dominated by paper ballots, we cannot be certain that the election result is correct. Many parts of the world are plagued by corrupt election officials and rampant bribery. In this thesis, we review the cryptographic techniques available for designing a secure election system and introduce a system designed around a verifiable mixnet using split-value commitments. The main work done in this thesis is a series of performance optimizations to the existing prototype, greatly improving the real-world viability of the system. Finally, we suggest further work that can be done to improve performance, fault tolerance, and security. The code that accompanies this thesis may be found at https://github.com/ron-rivest/ split-value-voting. Thesis Supervisor: Prof. Ronald L. Rivest 3 4 Acknowledgments Professor Ron Rivest introduced me to the fields of computer security and cryptography and sparked my interest when I took 6.857 Computer and Network Security in the spring of 2013. His knowledge, encouragement, and patience helped guide and encourage me through every step of this thesis. I’m also extremely grateful that, through working with Professor Rivest, I’ve been able to delve into the field of electronic voting, a field that I barely knew anything about a year ago. Professor Rivest and Professor Michael Rabin designed the split-value voting system that this thesis is based on. Without the numerous meetings that we had where both of them spent time meticulously and patiently explaining every detail of the system, this thesis would not have been possible. Marco Pedroso completed an implementation of the distributed version of this system in the fall of 2014, and this thesis builds upon his work. In my time at MIT, I have been fortunate to have taken some great classes taught by some great professors who have inspired me to pursue a better understanding of various aspects of computer science. I’ve also had the honor to teach and TA incredible students, who made me learn things in ways I never would have otherwise. To my friends, at MIT, in Boston, and from back home, thanks for being with me every step of the way and pushing me to achieve things I didn’t know were possible. Years later, having forgotten all of the psets and readings, I’ll still remember all the experiences and memories. And last, but certainly not least, to my family, for giving me all the resources that I’ve ever needed to get the education that I have today. Whether it’s the math drills fifteen years ago or last week’s call to make sure I wasn’t staying up too late writing this thesis (oops...), I wouldn’t be at MIT without you all. Thanks for your infinite support, wisdom, advice, and encouragement. 5 6 Contents 1 Introduction 13 1.1 A Brief History of Voting Systems . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 The Rise of Electronic Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3 Desirable Properties in Electronic Voting Systems . . . . . . . . . . . . . . . . 16 1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 Electronic Voting Systems and Technologies 2.1 2.2 2.3 2.4 19 Design of End-to-End Voting Systems . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 A Secure Bulletin Board . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Cast-as-Intended Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3 Recorded-as-Cast Behavior . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.4 Tallied-as-Recorded Behavior . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.5 Threshold Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Homomorphic Encryption Schemes . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Exponential El Gamal . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.2 Existing Systems Using Homomorphic Encryption . . . . . . . . . . . 24 Mixnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Decryption and Reencryption Mixnets . . . . . . . . . . . . . . . . . . 25 2.3.2 Verifiable Mixnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.3 Early Mixnets: Individual Verifiability . . . . . . . . . . . . . . . . . . 26 2.3.4 Sako and Kilian: Universal Verifiability . . . . . . . . . . . . . . . . . 27 2.3.5 Jakobsson, Juels, and Rivest: Randomized Partial Checking . . . . . . 28 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 7 3 End-to-End Verifiable Voting Using Split-Value Representations 3.1 3.2 3.3 The Split-Value Voting System Design . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.2 The Verifiable Mixnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 Server Processes and Communication . . . . . . . . . . . . . . . . . . . 34 3.2.2 The Implementation of Randomness . . . . . . . . . . . . . . . . . . . 35 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4 Performance Optimization of the Split-Value System 4.1 31 37 Performance of Cryptographic Primitives . . . . . . . . . . . . . . . . . . . . . 38 4.1.1 Hashing and Encryption Functions . . . . . . . . . . . . . . . . . . . . 38 4.1.2 Pseudorandom Number Generators . . . . . . . . . . . . . . . . . . . . 39 4.2 Performance of JSON Serializers . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Optimization of the Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.1 The Python Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.2 Message Passing Over Sockets . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.3 Byte/Integer Utility Functions . . . . . . . . . . . . . . . . . . . . . . 43 4.3.4 Introducing Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.5 JSON Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.6 Hashing the Secure Bulletin Board . . . . . . . . . . . . . . . . . . . . 46 4.4 Performance Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.5 Comparison with Other Systems . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5 Future Work 51 5.1 Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3 Secure Channels and Key Exchange . . . . . . . . . . . . . . . . . . . . . . . . 53 6 Conclusion 55 8 List of Figures 3-1 The 3 × 3 grid of mixservers and the communication among them . . . . . . . 33 4-1 CProfile output for a single call to a mixserver handler . . . . . . . . . . . . . 42 9 10 List of Tables 4.1 Performance measurements of cryptographic primitives . . . . . . . . . . . . . 38 4.2 Performance measurements of pseudorandom number generators 4.3 Performance measurements of JSON serializers . . . . . . . . . . . . . . . . . 40 4.4 Election runtime at each stage of optimization . . . . . . . . . . . . . . . . . . 41 4.5 Election runtime for varying numbers of voters . . . . . . . . . . . . . . . . . 47 4.6 Election runtime for varying numbers of repetitions . . . . . . . . . . . . . . . 48 11 . . . . . . . 39 12 Chapter 1 Introduction As a cornerstone of democratic governments, elections are the subject of intense scrutiny. Voters expect that the systems and procedures put in place are easy to follow, efficient, and inexpensive. However, history tells us that in practice, this is not always be the case. The 2000 Presidential Election in the United States is perhaps the most well-known failure of a voting system in the current generation. The series of recounts in Florida delayed the nationwide election result by several weeks and opened cases in the Florida Supreme Court and the United States Supreme Court. Even in the aftermath of the election, studies were inconclusive as to whether county-specific recounts or a statewide recount would have swung the results in Al Gore’s favor. The problem was exacerbated by the media’s premature declaration of the election outcome, George W. Bush’s narrow margin of victory (around 500 votes), and controversy surrounding the “butterfly ballots” that were used in Palm Beach for the first time [40]. Studies conducted after the 2000 Election in Florida reveal many issues that appear to be quite subtle in the larger picture of a Presidential election: the public’s familiarity with a new ballot layout and instructions, the timing of media releases and exit polls, and even the degree to which paper ballots are cleanly marked or punctured by the voter. In the aftermath of the chaos, Congress passed the Help America Vote Act, which helped states upgrade their election infrastructure to prevent similar issues four years later. Many states turned to electronic voting systems. Unfortunately, these systems have been criticized for their issues surrounding accessibility, relability, and security [36]. Furthermore, many of these proprietary systems have reached or are approaching the ends of their lifetimes, 13 prompting another wave of research and development on secure, verifiable, efficient, and reliable electronic voting systems. Elections are not only thwarted by the failures of the mechanisms used in them. Reports of manipulated or lost ballots, coerced voters, and corrupt election officials are no surprise in certain parts of the world even today. In decades past, voters in Italian elections often had the option of selecting ranked preferences among a long list of candidates. The clever “Italian attack” [24] reportedly used by the Mafia involved coercing a voter to vote for the coercer’s first choice candidate, followed by a sequence of low-probability candidates in some predetermined order. With a long list of candidates, the chance that another ballot contains the exact same sequence in the same slots is very low; thus, if the votes are publicly broadcast following the election, the sequence becomes a signature for the coerced voter that the coercer can check, even if all other identifiable information is stripped. Clearly, there is much more to be done in improving election systems. In this chapter, we briefly survey the history of voting systems, including electronic voting systems, and conclude with a discussion of desirable properties in electronic voting systems. 1.1 A Brief History of Voting Systems The first recorded elections occurred in the sixth century BC in ancient Greece, where adult males were charged with the responsibility to vote at meetings of the assembly. In the beginning, voting was typically done by raising hands in public spaces. This developed into the use of colored stones, where voters would select one of two colored stones and place it into a central jar. The first “secret ballots” were used in ancient Greece and Rome, where records exist of contraptions that allowed a voter to hide the stone as it was deposited into the jar. By the end of the nineteenth century, most Western countries, including the United States, Australia, and France, demanded the use of a secret ballot at elections [39], and voter privacy was a non-negotiable property of official elections. As populations grew and election officials looked for ways to make elections more efficient, mechanized voting machines were developed. Lever machines were introduced in New York during the 1892 Election. Punch card systems were introduced in 1965 and were also widely used until they drew sharp criticism during the 2000 Election in Florida. Lever machines, punch cards, and other mechanical and electrical systems such as optical scan14 ners provide different tradeoffs in cost, convenience, efficiency, usability, accessibility, and security. However, in the past decades, growth in both the power and popularity of digital technology has enabled a shift towards electronic voting systems. 1.2 The Rise of Electronic Voting In the aftermath of the 2000 Election, the Help America Vote Act instituted a series of requirements on the states which aimed to replace punch card and lever machines, establish standards on voter privacy and accessibility for voters with disabilities, and improve the usability of voting systems. The number of direct-recording electronic (DRE) voting machines grew in response. DREs have a number of advantages over their mechanical counterparts. ∙ They enable rapid, accurate tallying of votes. ∙ They may include numerous accessibility functions, notably those for the visually impaired. ∙ They may include many languages. ∙ They may catch errors such as undervoting and overvoting before the vote is cast, preventing invalid ballots from being discarded. However, DREs are often complex systems containing proprietary hardware and software, making it difficult to verify their correctness and security properties; indeed, studies [36, 11, 17] have shown the existence of numerous security vulnerabilities in many commercially available electronic voting systems. Because of the lack of a paper trail, any errors in the system are unrecoverable. The voter-verified paper audit trail (VVPAT) system was proposed by Mercuri [23] in 1992 as a solution to the lack of verifiability in DREs. In the simplest VVPAT system, a DRE machine is augmented with a printer. In addition to submitting the vote electronically, the machine prints out a human-readable copy of the voter’s selections. The voter then confirms that the paper record is indeed correct before depositing it into a ballot box. In case of a disputed election, recount, or malfunctioning machines, the paper records may be consulted. VVPAT systems may be extended so that the voters may keep a printed receipt; however, care must be taken to ensure that the printed receipt does not reveal the voter’s vote to anyone else. 15 Today, VVPAT systems are the most common type of electronic voting system; 27 states in the United States require a paper audit trail by law and an additional 18 states use them in their elections. 1.3 Desirable Properties in Electronic Voting Systems The two main propeties desired in an election are privacy and verifiability. Privacy concerns the ability of an adversary to gain information on a voter’s choices in an election. A lack of privacy results in the possibility of coercion or bribery as seen in the earlier example of the “Italian attack”. Note that the adversary is not limited to an outside third party; privacy is broken if a corrupt election official may associate any vote with the corresponding voter. As voting systems have become more complex and electronic voting systems have become more popular, the notion of privacy has evolved. The simplest form of privacy is ballot privacy, which stipulates that the contents of the ballot must remain secret. With typical paper ballots such as those used today in the United States, ballot privacy guarantees that there can be no association made between a voter and her ballot, thereby providing privacy to the voter. However, with electronic voting systems, a stronger definition of privacy is desired. Receipt freeness or incoercibility, first introduced by Benaloh and Tuinstra [5], concerns the inability of the voter to later prove how she voted. Equivalently, it is impossible to effectively coerce a voter by forcing her to reveal her vote following the election. The concept of receipt freeness grew in importance as electronic voting systems often leave a receipt that the voter may bring out of the polling site. In countries such as Switzerland and Estonia, Internet voting has become commonplace [37]. In Estonia, for example, voters may scan their government issued ID cards to authenticate themselves on a voting website. Many more states and countries have vote-by-mail, where ballots may be mailed ahead of the election date. A stronger notion of coercion resistance stemming from the work of Juels, Catalano, and Jakobsson [20] identified the need to prevent adversaries from jeopardizing voter privacy in these cases, for example, by stealing or forcing a voter to reveal her authentication credentials. Verifiability concerns whether individual voters or the public as a whole may verify that the election outcome faithfully represents voters’ selections. In particular, individual verifiability refers to the ability of a voter to verify that her vote was correctly recorded in 16 the election. Universal verifiability refers to the ability of any member of the public to verify that the declared election outcome matches the set of votes that were cast. Although today’s paper ballots provide good privacy, they do so at immense cost to verifiability. Despite thorough documentation of best practices concerning how ballots should be handled, who should touch them, how they should be sealed, etc. [26], it remains practically impossible for a voter to know for certain that her ballot was not lost in transit. Since ballots are not released to the public, voters must trust that the outcome declared by election officials represents the accurate tallying of all ballots. As seen in the 2000 Presidential Election, good verifiability is not a strength in the systems used in the majority of the United States. For electronic voting systems, however, we may achieve verifiability while preserving privacy by using cryptography where appropriate. For example, a vote may be encrypted before being cast; the encrypted votes may then be added homomorphically to yield an encryption of the final election outcome, enabling the election officials to compute the result of an election without revealing any individual votes. Threshold encryption may strengthen this. For example, by requiring that 3 out of 5 election officials be present to obtain the decryption key, we reduce the chance of a malicious election official revealing individual votes. By introducing a secure, publicly viewable bulletin board, we may achieve verifiability at every stage from when the vote is cast until the election result is revealed. Such a system is end-to-end verifiable. Definition 1.3.1. An end-to-end verifiable voting system is one that satisfies the following properties. ∙ Cast-as-intended. It may be verified that the encryption or other representation of the vote is indeed a representation of the proper vote. Methods to ensure cast-as-intended behavior include Chaum’s work [8] or the challenge/response method proposed by Neff [25] and Benaloh [4]. ∙ Recorded-as-cast. It may be verified by the voter that her vote is properly taken into account. For example, voters may be given identifiers, and a public bulletin board may contain a listing of voter identifiers and corresponding encrypted votes. We must ensure that this step does not reveal the plaintext vote to an adversary. 17 ∙ Tallied-as-recorded. It may be verified by anyone that the final election result is correct given the individual votes. For example, a list of encrypted votes could be homomorphically added, and then the election officials could provide a zero-knowledge proof that the election result corresponds to a decryption of the tallied votes without revealing the decryption key. 1.4 Conclusion In this chapter, we presented an overview of voting systems: their history, implementations, and downfalls. We discussed the various notions of privacy and verifiability and introduced the concept of an end-to-end verifiable voting system. In the remainder of this thesis, we survey the relevant technologies used in end-to-end verifiable voting systems in Chapter 2, introduce the split-value system and its implementation in Chapter 3, present performance optimizations to the system in Chapter 4, discuss opportunities for future work in Chapter 5, and conclude in Chapter 6. 18 Chapter 2 Electronic Voting Systems and Technologies We now turn to an overview of existing electronic voting systems and technologies, focusing on systems that provide end-to-end verifiability. First, we discuss common design patterns found in end-to-end verifiable voting systems. Then, we present an overview of homomorphic encryption based designs and mixnets, two methods for achieving privacy with end-to-end verifiability. 2.1 Design of End-to-End Voting Systems Recall from section 1.3 that an end-to-end verifiable voting system is defined by three characteristics: verifiably cast-as-intended, verifiably recorded-as-cast, and verifiably tallied-asrecorded. The first two conditions must be verified by the voter herself; our desire for incoercibility requires that no third-party may verify these properties on behalf of the voter. The last condition may be verified by anyone in the public, including someone who did not participate in the election. By verifying all three conditions, any voter may verify that her vote was counted properly and that the declared election outcome is honest. A typical election consists of two phases. 1. The voting phase. In the voting phase, a voter indicates her choices for a given election by casting a ballot. The voter’s ballot is typically encrypted, and the system must provide a proof to the voter that the encryption of the ballot faithfully represents her 19 choices. For the purposes of this discussion, we will assume that the electronic ballot is accompanied by a paper trail and the voter is given a receipt that she may take out of the polling site. We will also assume that privacy is guaranteed inside the voting booth; even with the cooperation of the voter, no one else may observe the voter’s actions inside the voting booth. 2. The tallying phase. In the tallying phase, the election officials tally the set of encrypted ballots to produce the election result. We require that they do so without compromising the privacy of any individual voter; otherwise, a corrupt election official can act as a coercer. We also require that the election officials provide a proof that the announced election outcome is honest. In this section, we examine designs that are used to ensure each of these conditions of an end-to-end verifiable voting system is met. 2.1.1 A Secure Bulletin Board Central to the design of end-to-end verifiable voting systems is a publicly viewable secure bulletin board. There may be additional restrictions imposed on the actions that may be taken on it; for example, the bulletin board may be append-only. In general, encrypted votes may be posted to the secure bulletin board along with identifiable information about a voter as long as the ciphertext provides no information about the underlying plaintext to anyone, including the voter herself. This means, for example, that the voter cannot know the randomization values used in the encryption; otherwise, a voter could prove her vote to a coercer. The secure bulletin board also contains a proof of the election outcome, thereby allowing anyone to verify it. This can be accomplished by listing the plaintext votes after removing the associations between the plaintexts and voter IDs or by posting a zero-knowledge proof of correctness for the decryption of the tallied votes. 2.1.2 Cast-as-Intended Behavior To ensure cast-as-intended behavior, we use the “cast or challenge” protocol as described in [25, 4]. After a voter has made her selections, the system encrypts them using some 20 randomness and prints a receipt containing a commitment to the ciphertext (e.g. a cryptographic hash of the ciphertext). The system may additionally print a paper copy of the ballot containing more information which the voter must deposit into the ballot box before leaving. The voter then has the option to either cast the ballot or challenge the system. ∙ The voter chooses to cast her ballot. She deposits the printout into the ballot box. ∙ The voter chooses to challenge the system. In this case, the system reveals the ciphertext and the randomization values used to produce it. The voter can verify that encrypting her vote with the given randomization produces the desired ciphertext and that the commitment to the ciphertext (e.g. hash value) on the printout is correct. Should the voter choose this option, the ballot is spoiled. She must ask the system for a new randomized encryption of her selection, with which she has the option to cast or challenge again. The system must output a commitment to some ciphertext before it knows whether the voter will choose to cast or challenge, and a skeptical voter may increase the probability of catching a malicious system by challenging many times. Due to our privacy assumptions about the polling site, no one else may observe the voter while she issues a challenge to the system. However, a ballot that the voter challenges must be spoiled because the challenge process reveals how the ciphertext is constructed; a voter could use this information to prove how she voted after leaving the polling site if the encrypted votes are publicly posted. 2.1.3 Recorded-as-Cast Behavior To ensure recorded-as-cast behavior, we provide the voter with a way to check that her vote is accurately recorded on the secure bulletin board. Using her paper receipt, the voter can find her voter ID on the secure bulletin board and verify that the hash value of the posted ciphertext corresponds to the hash value on her receipt. Combined with the cast or challenge protocol of the previous section, the voter can verify that her selection has been correctly recorded on the secure bulletin board. However, she cannot prove to anyone else how she voted with just the ciphertext and corresponding hash value. 21 2.1.4 Tallied-as-Recorded Behavior To ensure tallied-as-recorded behavior, the system provides a proof that the encrypted votes recorded on the secure bulletin board correspond to the election outcome. In general, there are two ways to accomplish this: either the ciphertexts are combined to produce the election tally using homormorphic encryption, or the ciphertexts are shuffled and then decrypted so they cannot be traced back to the voters that produced them. Homomorphic encryption allows us to tally the votes without having to decrypt any individual vote, thereby maintaining voter privacy. Mixnets shuffle the input list of votes so that the output list cannot be traced back to the inputs; the output list may then be decrypted and the plaintexts posted on the secure bulletin board so that anyone may verify the election tally. Sections 2.2 and 2.3 describe each scheme in greater detail. 2.1.5 Threshold Encryption In general, it is advisable to use threshold encryption, in which a minimum threshold 𝑘 out of a possible 𝑛 trustees must agree before decryption can occur. This mitigates the risk that some subset of election officials is corrupt and prevents them from decrypting individual votes. One way to share some secret value (such as a secret key) among many parties is to use the method developed by Shamir [35]. To share a secret among 𝑛 trustees with a minimum threshold 𝑘, we use a polynomial 𝑃 of degree 𝑘 − 1 over a finite field. The secret value is set as 𝑃 (0); each trustee is given a point (𝑥, 𝑦) for 𝑥 ̸= 0. Because 𝑘 points are required to reconstruct a polynomial of degree 𝑘 − 1, any 𝑘 trustees will be able to recover the secret key, but fewer trustees will not be able to. With some encryption schemes, threshold encryption may be enabled even without the use of Shamir secret sharing; for example, [28, 16] provides a way to do this with the El Gamal public-key encryption scheme. 2.2 Homomorphic Encryption Schemes In a homomorphic encryption scheme, a set of operations may be performed on ciphertexts to produce the encrypted result of a (possibly different) set of operations on the underlying plaintexts without the need for decryption. The usefulness of homomorphic encryption is 22 immediately apparent in a cryptographic voting system: ciphertexts corresponding to the selections of individual voters may be tallied to yield an encryption of the election outcome without requiring that the individual votes be decrypted. 2.2.1 Exponential El Gamal The exponential El Gamal public-key encryption [15] is one example of a cryptosystem that supports homomorphic encryption. Definition 2.2.1. (Exponential El Gamal) ∙ Key Generation. Given a group 𝐺 with order 𝑞 and generator 𝑔, which are public 𝑅 parameters, choose the secret key 𝑘 ← − {1, . . . , 𝑞 − 1} and the public key 𝑦 = 𝑔 𝑘 . 𝑅 ∙ Encryption. For each plaintext message 𝑚, select 𝑟 ← − {1, . . . , 𝑞 − 1} and construct the ciphertext (𝑐1 , 𝑐2 ) = (𝑔 𝑟 , 𝑔 𝑚 𝑦 𝑟 ). ∙ Decryption. Given a ciphertext (𝑐1 , 𝑐2 ), calculate )︁ (︁ )︁ (︁ log𝑔 (𝑐𝑘1 )−1 𝑐2 = log𝑔 𝑔 −𝑘𝑟 𝑔 𝑚 𝑔 𝑘𝑟 = 𝑚 ∙ Homomorphic Addition of Ciphertexts. Given two ciphertexts 𝐶1 = (𝑔 𝑟1 , 𝑔 𝑚1 𝑦 𝑟1 ) and 𝐶2 = (𝑔 𝑟2 , 𝑔 𝑚2 𝑦 𝑟2 ), componentwise multiplication yields an encryption of the sum of their underlying plaintexts with randomness 𝑟1 + 𝑟2 . 𝐶1 × 𝐶2 = (𝑔 𝑟1 , 𝑔 𝑚1 𝑦 𝑟1 ) × (𝑔 𝑟2 , 𝑔 𝑚2 𝑦 𝑟2 ) = (𝑔 𝑟1 +𝑟2 , 𝑔 𝑚1 +𝑚2 𝑦 𝑟1 +𝑟2 ) Tallying and Proof of Decryption After the votes have been tallied via homomorphic addition, the resulting ciphertext is decrypted to yield the sum of all the votes. This decrypted sum is then posted to the secure bulletin board and should be consistent with the declared election outcome. Finally, the system needs to prove that the posted decryption is a proper decryption of the ciphertext without revealing the secret key; otherwise, a voter could use the revealed secret key to prove to a coercer how she voted. 23 In exponential El Gamal, given public parameter 𝑔 and public key 𝑦, we want to show that the revealed plaintext 𝑚 is the proper decryption of the ciphertext (𝑐1 , 𝑐2 ). Recall that a proper encryption of the plaintext 𝑚 results in ciphertext (𝑔 𝑟 , 𝑔 𝑚 𝑦 𝑟 ). Thus, a proof of correct decryption is equivalent to proving that log𝑔 (𝑐1 ) = log𝑦 (𝑐2 /𝑔 𝑚 ), which can be done using the scheme described by Chaum and Pedersen [10]. 2.2.2 Existing Systems Using Homomorphic Encryption Many existing end-to-end verifiable voting systems use homomorphic encryption because its implementation is rather straightforward and the cryptographic primitives used are wellsupported. Despite starting with mixnets, the Helios system [2], which runs in a web browser and is intended for online elections where the threat of coercion is low (e.g. school or club elections), now uses exponential El Gamal. Design proposals for the STAR-Vote system in Austin County, TX [3] use very similar techniques as those discussed here, including the castor-challenge protocol, encrypted votes posted on a publicly-viewable secure bulletin board, and homomorphic tallying with exponential El Gamal. In these systems, the weaknesses of using homomorphic tallying, including the computational cost of modular exponentiation used in El Gamal and the difficulty of supporting write-in candidates, are offset by the simpplicity and clarity of the design and implementation. 2.3 Mixnets Another way of verifying that the election result accurately reflects the set of encrypted votes is to use a mixnet. First described by Chaum [7] as a method to ensure anonymous communication, mixnets secretly permute the list of inputs so that they cannot be associated with the list of outputs. A mixnet may involve many mixservers that are each a part of the overall chain; each mixserver secretly permutes the output of the previous mixserver, and sends its output to the next mixserver to be permuted. In the context of elections, the input to the mixnet is the list of encrypted votes, and the output is a list of the same votes, encrypted differently and in some different order. We can decrypt the ciphertexts that are output from the mixnet and post the resulting plaintexts to the secure bulletin board along with proofs of correct decryption. This allows anyone to check that the plaintext votes produce the declared election outcome while making it 24 impossible to associate each any plaintext vote with a voter. The encryption of the votes must change between the input and output; we cannot simply shuffle the encrypted values. Otherwise an adversary could easily compare the input and output lists, find identical ciphertexts, and associate them with the voter IDs which are posted alongside the input ciphertexts on the secure bulletin board. 2.3.1 Decryption and Reencryption Mixnets There are two general types of mixnets which accomplish our goals: decryption mixnets and reencryption mixnets. ∙ In a decryption mixnet, a message is encrypted with the public key of each mixserver in the reverse order of traversal. In a system with 𝑛 mixservers, the input to the mixnet for message 𝑚 is Enc𝐾𝑛 (Enc𝐾𝑛−1 (. . . (Enc𝐾1 (𝑚)))), where Enc𝐾𝑖 denotes encryption using the public key of the 𝑖th mixserver. At each mixserver, one layer of encryption is removed and the list of messages is shuffled before being sent to the next mixserver. The mixnet outputs a list of decrypted messages whose ordering cannot be traced back to the input ordering. Decryption mixnets are commonly used to anonymize routing in designs known as “onion routing”, where each message is encrypted repeatedly like the layers of an onion, and each mixserver “peels off” a layer until the plaintext message is reached. ∙ In a reencryption mixnet, a message is encrypted once and input into the mixnet. At each mixserver, messages are reencrypted using a public-key encryption scheme that supports reencryption and then shuffled. Exponential El Gamal, as discussed in section 2.2.1, supports reencryption by homomorphically combining the message to be reencrypted with an encryption of the “zero message”. This similarly achieves our goals: the underlying plaintexts do not change through the mixnet, but their ciphertext representations and ordering are changed in a way that makes the outputs untraceable to the inputs. 2.3.2 Verifiable Mixnets A malicious mixserver might ignore the inputs given to it and send whatever list of encrypted votes it desires to the next mixserver in the chain. If the malicious mixserver were the last 25 mixserver in the chain, it could ignore all of the ballots, decide the election outcome, and construct an output list that matches that outcome. Thus, in addition to preserving anonymity, we require that our mixnets are verifiable. In the early mixnets, individuals who sent messages through the mixnet could verify that their messages were correctly reproduced at the output end, but an outside observer could not verify the correct operation of the mixnet. Later, the concept of universal verifiability was created, leading to the requirement that mixnets provide proofs of correct operation that any observer can verify. In this section, we present a brief survey of mixnet designs that lead to the mixnet used in the split-value voting system. This survey is influenced by the work done by Adida [1], which contains a more thorough analysis. 2.3.3 Early Mixnets: Individual Verifiability The first mixnet proposed by Chaum [7] was a decryption mixnet. The sender prepares an input by encrypting her message with the public keys of each mixserver in reverse order of traversal, padding each ciphertext with some randomness before encrypting with the next public key. Each mixserver decrypts one layer of the input, reorders the messages, and passes the messages onto the next mixserver. The sender, knowing the randomness values that were used, can verify that her message was output correctly. One drawback to Chaum’s design is that the input size is proportional to the number of mixservers, since each layer of encryption requires padding with randomness. The mixnet proposed by Park et al. [27] improves upon this by using partial decryption and reencryption using El Gamal. Denoting the secret key of mixserver 𝑖 to be 𝑘𝑖 and the corresponding public key 𝑦𝑖 = 𝑔 𝑘𝑖 , the input to the mixnet is the message encrypted using the product of the public keys of each mixserver. 𝑦= ∏︁ 𝑦𝑖 = 𝑔 ∑︀ 𝑖 𝑘𝑖 𝑖 At mixserver 𝑚𝑖 , given ciphertext 𝑐 = (𝑐1 , 𝑐2 ) = (𝑔 𝑟 , 𝑚 · 𝑦 𝑟 ) (note that this does not require the homomorphic variant of El Gamal where the second ciphertext component is 𝑔 𝑚 𝑦 𝑟 ), the message is partially decrypted by removing the contribution of that mixserver’s public key from the joint public key. 𝑖 PartialDec𝑖 (𝑐) = (𝑐1 , 𝑐2 · 𝑐−𝑘 1 ) 26 The resulting ciphertext is then reencrypted using standard El Gamal reencryption with additional randomness 𝑟′ . ′ ′ ′ ′ ReEnc(𝑐) = (𝑐1 · 𝑔 𝑟 , 𝑐2 · 𝑦 𝑟 ) = (𝑔 𝑟+𝑟 , 𝑚 · 𝑦 𝑟+𝑟 ) Vulnerabilities were discovered in both early mixnet designs; Pfitzmann [31, 30] showed that related inputs produced related outputs in the mixnet; by choosing a malicious input that is related to another non-malicious input, an attacker could trace both inputs through the mixnet and discover the identity of the sender of the non-malicious message. 2.3.4 Sako and Kilian: Universal Verifiability Definition 2.3.1. A universally verifiable mixnet is one that provides a proof of correct operation that any observer, including one that did not provide an input to the mixnet, may verify. Using a universally verifiable mixnet in an electronic voting system enables any member of the public to verify that the encrypted votes posted on the secure bulletin board were honestly mixed prior to tallying, an important component of an end-to-end verifiable voting system. The first universally verifiable mixnet was proposed by Sako and Kilian in [33] by borrowing the design of the partial decryption and reencryption mixnet of [27] and adding a proof of correct partial decryption and a proof of correct reencryption and shuffling which is constructed by each mixserver. Proof of correct partial decryption We prove that partial decryption was done correctly at each mixserver by constructing a proof of equality of discrete logarithms. Given mixserver 𝑚𝑖 with public key 𝑦𝑖 = 𝑔 𝑘𝑖 , ciphertext 𝑐 = (𝑐1 , 𝑐2 ) and PartialDec(𝑐) = (𝑐′1 , 𝑐′2 ), we compute 𝑐2 /𝑐′2 and construct a proof that log𝑔 (𝑦𝑖 ) = log𝑐1 (𝑐2 /𝑐′2 ) using a scheme such as that described in [10]. 27 Proof of correct reencryption and shuffling To prove that it shuffled and reencrypted messages correctly, each mixserver constructs a second shuffle and reencryption. With 50% probability, the verifier asks the mixserver to prove that the second shuffle and reencryption was done correctly by revealing the permutation and randomization values used. The other half of the time, the verifier asks the mixserver to construct and reveal a permutation and set of randomization values to go between the primary and secondary shuffles. Thus, a dishonest mixserver will be caught with 50% probability. This process may be made non-interactive via the Fiat-Shamir heuristic [12]. 2.3.5 Jakobsson, Juels, and Rivest: Randomized Partial Checking In the randomized partial checking (RPC) design proposed by Jakobsson, Juels, and Rivest in [19], each mixserver publishes a commitment to the permutation and randomization values used (e.g. by using a hash function). Then, the verifier constructs a challenge by choosing half of the inputs, for which the mixserver must reveal the permutation and randomization values. The verifier checks that the output of the mixserver is consistent with the mixserver’s response to the challenge and that the permutation and randomization values are consistent with the published commitments. Because each input to a mixserver has probability 1/2 of being chosen, the probability that a single mixserver can act dishonestly about 𝑛 votes diminishes as 2−𝑛 . Recent work by Khazaei and Wikström [22] demonstrated practical attacks against decryption and homomorphic mixnets with RPC that break privacy and correctness. These attacks have been found (and later fixed) in systems such as Scantegrity [9]. Khazaei and Wikström recommend that RPC is not used with homomorphic mixnets and used with decryption mixnets only after more rigorous proofs of security are developed. 2.4 Conclusion In this section, we surveyed the many technologies that have been developed for end-to-end verifiable electronic voting systems. We presented the methods used for ensuring verifiable cast-as-intended, recorded-as-cast, and tallied-as-recorded behaviors, including the “cast or challenge” protocol, homomorphic tallying with exponential El Gamal, and universally verifi28 able mixnets. In the next chapter, we present the split-value representation and commitment scheme and show how it may be used in conjunction with a mixnet to achieve end-to-end verifiablity. 29 30 Chapter 3 End-to-End Verifiable Voting Using Split-Value Representations In this section, we present the split-value system enabling end-to-end verifiable voting as described by Rabin and Rivest [32] as well as an overview of the implementation as given in [29]. We will focus on the main ideas and components of the design and implementation of the system, and leave the explanation of the details to the respective papers. 3.1 3.1.1 The Split-Value Voting System Design Definitions Given an election, choose the representation modulo 𝑀 so that any voter’s selection, including possible write-in candidates, may be represented by an integer modulo 𝑀 . Definition 3.1.1. The split-value representation of 𝑥 modulo 𝑀 is the pair SV(𝑥) = (𝑢, 𝑣) such that 𝑥 = 𝑢 + 𝑣 mod 𝑀 . Note that for a given 𝑥, knowing the value of 𝑢 or 𝑣, but not both, reveals no information about 𝑥. This key property allows us to construct zero-knowledge proofs of equality for two values using their split-value representations without revealing the actual values themselves. Definition 3.1.2. A commitment to a value 𝑥 with randomness 𝑟 is the value 𝑐 = Com(𝑥, 𝑟) such that 𝑐 may be “opened” to reveal 𝑥 and 𝑟. We desire that the Com function is both hiding and binding. That is, given 𝑐 = Com(𝑥, 𝑟), no information is gained about 𝑥, and it is infeasible to produce another (𝑥′ , 𝑟′ ) such that 𝑐 = Com(𝑥′ , 𝑟′ ). 31 The HMAC function, with the randomness 𝑟 as the key, is used as the commitment function in our implementation. Definition 3.1.3. Given SV(𝑥) = (𝑢, 𝑣) and randomness 𝑟 and 𝑠, the split-value commitment of 𝑥 is ComSV(𝑥) = (Com(𝑢, 𝑟), Com(𝑣, 𝑠)). Using split-value commitments, we may construct probabilistic proofs that two values are equal without revealing the value itself. Suppose that a prover wishes to prove that 𝑥 = 𝑥′ . Given their split value commitments ComSV(𝑥) = (Com(𝑢, 𝑟), Com(𝑣, 𝑠)) ComSV(𝑥′ ) = (Com(𝑢′ , 𝑟′ ), Com(𝑣 ′ , 𝑠′ )) the prover claims that there exists a value 𝑡 such that 𝑡 = 𝑢 − 𝑢′ and 𝑡 = 𝑣 ′ − 𝑣. This can be true if and only if 𝑥 = 𝑥′ . The prover reveals the value of 𝑡 to the examiner. With probability 1/2, the examiner asks for the opening of the “left” commitments, in which case the prover reveals the values 𝑢, 𝑟, 𝑢′ , and 𝑟′ . The verifier checks that the values Com(𝑢, 𝑟) and Com(𝑢′ , 𝑟′ ) are correct and that 𝑡 = 𝑢 − 𝑢′ . With probability 1/2, the examiner asks for the “right” commitments, in which case the prover reveals the other components of the split-value commitment. Thus, if 𝑥 ̸= 𝑥′ , a dishonest prover can win with probability at most 1/2. 3.1.2 The Verifiable Mixnet The main component of the split-value system is a verifiable mixnet composed of a 3 × 3 grid of mixservers, as shown in Figure 3-1. These servers are responsible for receiving votes, obfuscating and permuting them, and producing an output list that cannot be traced back to the input list. Denote the mixserver in row 𝑖 and column 𝑗 as 𝑃𝑖,𝑗 . When the voter’s vote 𝑤 is cast on her device (e.g. a tablet), it is broken into three components 𝑥, 𝑦, and 𝑧 such that 𝑤 = 𝑥 + 𝑦 + 𝑧 mod 𝑀 . The device then creates split-value representations of each component. For each component, the split-value representation and its commitment are sent over a secure channel to the first column mixserver of the row corresponding to the component. For example, 𝑥 and ComSV(𝑥) are sent to 𝑃1,1 , 𝑦 and ComSV(𝑦) are sent to 𝑃2,1 , and so on. Note that because each mixserver only receives one 32 3 × 3 server array n voters P1,1 P1,2 P1,3 P2,1 P2,2 P2,3 P3,1 P3,2 P3,3 2m lists/server n Figure 3-1: The mixnet consists of a 3 × 3 grid of mixservers. Votes are split into three components and each component is sent to one of the leftmost servers. Each column of mixservers agree upon a shuffling and obfuscation; the shuffled and obfuscated values are then sent to the next column. The process is repeated 2𝑚 times. out of three components of the vote, no single server may reconstruct any individual vote (and break voter privacy) without leaked information from at least two other mixservers. Definition 3.1.4. A triplet 𝑆 ′ = (𝑥′ , 𝑦 ′ , 𝑧 ′ ) is an obfuscation of 𝑆 = (𝑥, 𝑦, 𝑧) if 𝑥′ + 𝑦 ′ + 𝑧 ′ = 𝑥 + 𝑦 + 𝑧 mod 𝑀 . The first column of servers agree upon a triplet of values (𝑝, 𝑞, 𝑟) such that 𝑝 + 𝑞 + 𝑟 = 0 mod 𝑀 for each vote. The first row then computes 𝑥′ = 𝑥 + 𝑝 mod 𝑀 , the second row computes 𝑦 ′ = 𝑦 + 𝑞 mod 𝑀 , and so on. The triple (𝑥′ , 𝑦 ′ , 𝑧 ′ ) is then an obfuscation of 1 (𝑥, 𝑦, 𝑧). The column of servers then agree upon a random permutation 𝜋 : {1, . . . , 𝑛} → {1, . . . , 𝑛}. Each server transmits the permuted obfuscated values to the next column in the same row: 𝑃1,1 transmits 𝑥′𝜋(1) , . . . , 𝑥′𝜋(𝑛) to 𝑃1,2 , 𝑃2,1 transmits the permuted 𝑦 ′ values to 𝑃2,2 , and so on. The second and third columns of mixservers repeat this process, and the results of the final obfuscation and permutation are posted to the secure bulletin board. Given a security parameter 𝑚, this process of obfuscation and permutation across each column of the mixnet is performed 2𝑚 times. In our current implementation, 2𝑚 = 24. 33 To verify the election result, we divide the 2𝑚 repetitions into two groups of 𝑚. The first 𝑚 repetitions are used to show that the mixnet does not alter the votes. To do so, we reveal the permutations used at each column. For each split-value commitment on the input side of the mixnet, we follow the revealed permutations to arrive at its corresponding split-value commitment on the output side and construct a proof of equality as described in the previous section. These permutations and proofs of equality are posted to the secure bulletin board for anyone to examine. The other 𝑚 repetitions are used to prove that the election outcome is correct. All of the commitments to the votes are opened, revealing the votes themselves. These votes are posted to the secure bulletin board along with the election outcome for anyone to examine. 3.2 Implementation Details The original design of the system included a functioning preliminary prototype written using Python 3.4. In this prototype, mixservers were simulated using Python objects. A simulation of a complete election, including vote generation, mixing, proof construction, and verification, was contained within a single Python process. Python was chosen for its ease of programming, debugging, profiling, and installation. Care was taken to ensure that the number of included libraries was kept to a minimum, so that the code is easy to run and the results are easily replicated. 3.2.1 Server Processes and Communication Subsequent work [29] significantly decoupled the system. Each of the mixservers, secure bulletin board, and vote generation simulator each operates within its own process. An interface was built on top of Python TCP socket servers to facilitate communication, and Python socket handlers were used to handle incoming requests. All data was serialized with standard JSON libraries. The implementation also introduced a controller server, whose responsibilities include the following. ∙ Serve as the first point of contact for all other servers. All other servers start knowing the IP address and port of the controller. ∙ Keep track of and broadcast network information, including all other servers, their addresses, and roles, to the network. 34 ∙ Assign roles to each server, and assign their positions in the mixnet. ∙ Synchronize the various phases of the election simulation, including vote production, mixing, proof construction, tallying, and verification. Because all communication occurs via standard TCP sockets, the implementation may be used on any group of computers running on a local area network. The implementation has been tested on single-core and multicore machines as well as multiple virtual machines running on a virtualized network. 3.2.2 The Implementation of Randomness One particular detail that deserves special attention is the implementation of randomness. Randomness is used throughout the system: in constructing split-value representations, in constructing commitments, in choosing which component of a split-value representation to reveal during a verification, and so on. Fast, cryptographically secure randomness is essential to the correct functioning of the system. The utility library written for this implementation includes a function that generates pseudorandom bytes. Each server maintains a list of randomness sources. When a randomness source is created, its name is used as the initial seed. When a sequence of pseudorandom bytes is desired, the seed is hashed twice using the SHA-256 hash function. The first hash determines the updated seed, which is stored in the place of the original seed. The second hash is of the original seed with a “tweak” string appended. With a good hash function, these two hashes will not appear to have originated from the same seed. During the proof construction phase, we require a good source of randomness to choose the 𝑚 repetitions that will be used to verify the mixnet and the other 𝑚 repetitions that will be used to verify the election result. There are two options for this. ∙ An external source of randomness, such as dice rolling at a public ceremony after voting has ended. The benefit to this method is that we obtain “true” randomness that no adversary can predict beforehand. However, it is possible that election officials and the public are uncomfortable with introducing what appears to be games of chance into the election process. ∙ A Fiat-Shamir style hash [12] of the secure bulletin board. With overwhelming probability, a Fiat-Shamir style hash of the secure bulletin board will provide randomness 35 that is good enough to overcome an adversary. There is the small but nonzero chance that an adversary could manipulate the election result by providing both a faulty proof that supports the manipulated outcome and an alternate version of the secure bulletin board that, when hashed, produces randomness that verifies her faulty proof. To compensate for this, we set the number of repetitions (𝑚) to be higher so that the chance of producing an alternate secure bulletin board that passes all of the verifications decreases exponentially. In the current implementation, we use the Fiat-Shamir method. However, this choice incurs a penalty in performance, as our setting for 𝑚 could be reduced dramatically by using an external source of randomness, resulting in fewer iterations through the mixnet and a shorter proof. In the next chapter, we will see that the use of an external source of randomness may provide enough of a performance boost to meet certain goals that election officials desire. 3.3 Conclusion In this section, we introduced the concepts and design behind the split-value verifiable voting system, including the idea of a split-value commitment. We discussed the ways that votes are obfuscated and permuted through the mixnet. Proofs are constructed to verify both the mixnet and the election outcome without associating voters with their votes. Finally, we examined details of the prototype implementation, including the communication interface between servers and the implementation of randomness used in our simulations. 36 Chapter 4 Performance Optimization of the Split-Value System After verifying the correctness of the implementation, we turn to performance. The performance of an electronic voting system is an important consideration when selecting among various options. The system must scale to handle hundreds of thousands or even millions of voters per district and perhaps multiple simultaneous races. Furthermore, the media and the public demand fast, accurate reporting of election results. We would like to be able to provide the election results along with a verified proof within hours of the closing of polling sites. One proposed advantage of the split-value system is its efficiency. Existing comparable systems such as STAR-Vote [3] use relatively time-consuming public key encryption operations such as modular exponentiation, while the split-value system uses much simpler and faster modular addition operations. Compared to other systems, the split-value system pays a higher cost due to the need for 2𝑚 repetitions. Furthermore, the cost of inter-process and inter-machine communication and data serialization become non-negligible. The implementation presented in [29], referred to as the reference implementation here, contained numerous opportunities for optimization. In this section, we begin with a survey of the performance of cryptographic primitives and data serializers used heavily in the implementation. Then we present the main optimizations that were successfully applied to the reference implementation and conclude with an analysis of the scalability of the system. All timings reported in this section, unless otherwise noted, are performed on a 16 core 37 Amazon Web Services (AWS) machine using two Intel E5-2666 v3 processors operating at 2.9 GHz and 32 GB RAM. The default settings for simulating an election consist of a single race with two candidates and a third write-in candidate up to sixteen characters, 1000 voters, and 2𝑚 = 24. 4.1 4.1.1 Performance of Cryptographic Primitives Hashing and Encryption Functions Function Python os.urandom Python hashlib.md5 Python hashlib.sha256 NaCl SHA-256 Cryptography SHA-256 Python HMAC with hashlib.sha256 Cryptography AES/CBC XSalsa20 Runtime (s) 36.95 5.71 7.64 33.47 150.27 51.33 377.80 33.94 Table 4.1: Runtimes, in seconds, of ten million calls to various cryptographic functions in Python built-in and third-party libraries. Table 4.1 shows the performance of various cryptographic primitives, including hash functions (MD5, SHA-256), a block cipher (AES in CBC mode), and a stream cipher (XSalsa20), by running each function ten million times. The built-in Python hash functions and XSalsa20 perform efficiently as expected, and we choose SHA-256 as the hash function used in the implementation as it is both fast and secure. It is surprising that the Python NaCl and Cryptography third-party libraries, both of which are very popular, significantly underperform compared to the built-in functions. The inefficiency of AES is particularly surprising given the Intel AES New Instructions Set, which gives hardware support to all standard modes of operation and key lengths. The expanded instruction set gives very good performance; encryption with 256-bit AES-CBC takes 5.65 CPU cycles per byte (256-bit AES-CBC is the most expensive operation) [18]. The CPUs used in this benchmark did indeed support the expanded instruction set, but it is likely that the third-party libraries did not use them. Unfortunately, the original design estimated the performance of AES to be more than two magnitudes faster than its Python implementation [32]. Even using the significantly 38 faster HMAC with SHA-256, commitments are more than forty times slower than estimated. It is likely that C code written using the expanded hardware instructions will offer significant performance benefits. For our Python implementation, we use HMAC with SHA-256 as the commitment function. 4.1.2 Pseudorandom Number Generators Function Python os.urandom sv.get_random_from_source Runtime (s) 36.95 46.87 Table 4.2: Runtimes, in seconds, of ten million calls to os.urandom and the SHA-256-based pseudorandom number generator. As discussed in section 3.2.2, randomness is used throughout the system. The method used in the implementation gives a good pseudorandom generator with three calls to the SHA-256 hash function (once to generate the new seed, and twice to compute the tweaked hash which becomes the output). However, as seen in Table 4.2, there is a non-negligible overhead likely due to the repeated calls to the digest function and the byte-encoding functions used to incorporate the tweaking string. When benchmarking the two functions side by side, it appears that os.urandom, which provides “a string of n random bytes suitable for cryptographic use” [13] is a faster option. However, in the context of the overall system, the improvement is minimal. Furthermore, os.urandom relies on the operating system’s implementation of the /dev/urandom entropy pool, which is not always guaranteed to have enough entropy to produce good randomness. For these reasons, we stick to the method of repeatedly hashing the seed string to generate pseudorandomness. 4.2 Performance of JSON Serializers The JSON format is used to serialize all data sent over the wire and posted to the secure bulletin board; it was chosen because it is compact, very commonly used, and well-supported by all platforms. Table 4.3 shows the performance of three common JSON serialization libraries for Python, tested on a moderately large (164.6 MB) JSON file. The simplejson and ujson libraries 39 Function Python json.loads simplejson.loads ujson.loads Python json.dumps simplejson.dumps ujson.dumps Runtime (s) 33.60 32.34 26.79 31.08 38.91 7.76 Table 4.3: Runtimes, in seconds, of five calls to serializing (dumps) and deserializing (loads) functions of three different Python JSON libraries on a 164.6 MB JSON file. have C extensions that can be compiled for performance boosts; the extensions were included in our tests. The ujson library significantly outperforms the other two libraries, but we could not use it because it does not support integers longer than 64 bits. From the results of our benchmarks, we expect similar runtimes using either of the other two libraries. However, other tests [21] show that simplejson holds a significant advantage on calls to loads, especially with lists of dictionaries. Ultimately, using simplejson yielded significant speedups over the native Python JSON library in our prototype. 4.3 Optimization of the Prototype To effectively optimize the reference implementation, we begin by collecting timing information on each phase of the simulation. We delve further into the performance of each function call by using the Python profiler, which tells us how much time is spent in each function, including and excluding the time spent in functions called from it. From there, we generally prioritize optimizing the most time-consuming functions as well as examining the most frequently called functions to determine if any unnecessary work is being done. We focus on the timings of the mix, proof, and verification phases as they are the most time-consuming. In a real election, the vote generation would occur throughout the day rather than all at once, so it is not a bottleneck in the system. The setup occurs before the election, and the tallying is very fast compared to the mix, proof, and verification phases. Table 4.4 shows the major optimizations that were performed on the reference implementations and the resulting runtimes. The performance gains for each optimization are reported relative to the row immediately preceding it in the table. In summary, the most effective optimizations involved leveraging the available parallelism 40 Optimization Reference implementation Socket operations Remove get_size in SBB operations Byte/Integer conversion Parallelization Use simplejson library Eliminate deepcopy Precompute SBB hashes Chain SBB hashes Mix (s) 51.41 30.47 20.96 18.30 6.21 5.48 5.92 5.41 5.36 Proof (s) 142.62 99.10 84.56 80.73 59.27 17.79 17.34 5.63 4.96 Verify (s) 24.00 22.02 18.82 17.93 9.95 7.08 5.80 5.76 5.84 Total (s) 225.44 154.99 129.74 122.29 80.77 35.72 34.38 22.12 21.47 Table 4.4: Runtimes, in seconds, of the major phases of a simulated election at each stage of the optimization process. All timings reflect the average of five simulations of an election consisting of one race, two candiates plus one write-in, and 1000 voters. The Total Time column includes the tallying time in addition to the runtime for the mix, proof, and verify phases, giving the system’s total expected runtime after an election has closed. in the multiple mixservers and ensuring that commonly used subroutines were as fast as possible by optimizing by hand, using a built-in Python function, or selecting a third-party library optimized for speed. Overall, we observe a 90% reduction in runtime due to our optimizations detailed in this section. In this section, we begin by introducing the Python profiler, CPython, and then describe each successful optimization applied and report its resulting speedup. 4.3.1 The Python Profiler The Python profiler, CProfile, allows us to see the amount of time spent in each function. We sort the CProfile output by the cumulative time per call, which, for every function call, gives the amount of time spent in that function and all functions called from it. Figure 4-1 gives one example of a part of a CProfile output when the profiler was run on the mixserver handler for the first phase of proof generation. The functions handle_prove, compute_output_commitments, ask_sbb_to_post, and json_client are all high-level wrapper functions that call many functions within them. We notice that most of the profiler output is dominated by JSON encoding (which is used when sending data to the secure bulletin board) and get_random_from_source, which in turn calls secure_hash and bytes2hex. Note that the function calls to dumps, which handles the JSON encoding, and get_random_from_source, which generates the randomness for the split-value commitments, together consume 13.982 of the total 18.982 seconds in this mixserver handler. This suggests 41 37877535 function calls (27315331 primitive calls ) in 18.982 seconds ncalls cumtime percall filename : lineno ( function ) 1 18.982 18.982 split - value - voting - dev / ServerMix . py :144( handle_prove ) 1 10.178 10.178 split - value - voting - dev / sv_prover . py :19( compute_output_commitments ) 1 8.804 8.804 split - value - voting - dev / ServerGeneric . py :30( ask_sbb_to_post ) 1 8.804 8.804 split - value - voting - dev / ServerController . py :223( json_client ) 3 7.890 2.630 split - value - voting - dev / sv . py :790( dumps ) 3 7.890 2.630 python3 .4/ json / __init__ . py :182( dumps ) 3 7.862 2.621 python3 .4/ json / encoder . py :175( encode ) 1920695 7.468 0.000 python3 .4/ json / encoder . py :404( _iterencode ) 12482899 6.827 0.000 python3 .4/ json / encoder . py :325( _iterencode_dict ) 1920645 6.125 0.000 python3 .4/ json / encoder . py :269( _iterencode_list ) 144000 6.092 0.000 split - value - voting - dev / sv . py :195( g e t_ ran dom _f rom _s our ce ) 336000 5.812 0.000 split - value - voting - dev / sv . py :47( secure_hash ) 144000 4.077 0.000 split - value - voting - dev / sv . py :86( bytes2hex ) Figure 4-1: CProfile output for a single call to the mixserver handler for the first phase of the proof production phase, sorted by the amount of time spent in each function. We note that JSON-related functions, get_random_from_source, and bytes2hex are timeconsuming function calls. that, in the prototype code before any optimizations, our message passing and pseudorandom number generation helper routines are taking up much more time than the cryptographic computation; the HMAC function used for the split-value commitments do not even appear in the top 13 lines of the profiler output. There is good reason to believe that the cryptography portion of the system’s design is relatively efficient as predicted, but other parts of the implementation are responsible for the majority of the runtime. 4.3.2 Message Passing Over Sockets As explained in the previous chapter, communication between machines happens via TCP servers communicating over Python’s standard socket implementation, which provides an interface for sending and receiving data [14]. ∙ socket.recv(bufsize). Receive data from the socket up to bufsize bytes, and return the data received. ∙ socket.send(string). Send data through the socket, and return the number of bytes sent. socket.send does not guarantee that the all data is sent; the caller must check 42 the return value and retry sending the remaining data if necessary. ∙ socket.sendall(string). Send data through the socket, and return None on success. Unlike socket.send, socket.sendall guarantees that all data is sent if the function returns successfully. Due to constraints in the operating system, the value for bufsize used in the socket.recv function is usually 4096 or 8192, corresponding to the maximum length that may be sent via a single call to socket.send. Note that there is not a corresponding socket.recvall function. If a longer message is desired, we must repeatedly call socket.recv until the entire message is received. The reference implementation handled this situation by sleeping for a short amount of time after each call to socket.recv before attempting to receive more data from the socket. Because socket.recv blocks if the data is not ready yet (up to some timeout), we do not need to sleep to wait for more data; rather, we just need to know how much data to expect and continuously call socket.recv until that length is reached. We define the following higher-level interface for communication over sockets: ∙ send(socket, message). Prepend message with its length and send the resulting bytes over socket. ∙ recv(socket). Receive and return all data from the socket by stripping out the length from the first chunk of the message and repeatedly calling socket.recv until the length is met or an error is encountered. We assume that the length of a message will never exceed the limit of a 32-bit signed integer. This optimization yielded a speedup of 20.9 seconds (40.7% reduction in runtime) in the mix phase and 43.5 seconds (30.5% reduction in runtime) in the proof phase. 4.3.3 Byte/Integer Utility Functions Our profiler output revealed that a significant amount of time was spent on the reference implementation’s utility functions for converting between Python bytes, integers, and hex strings. The conversions are necessary because different components of the system expect values to be in different forms; for example, the library used for cryptographic hash functions 43 returns a byte array, while our secure bulletin board, to which the results of computing these cryptographic hashes are posted, accepts string inputs. Specifically, for the sv.bytes2hex implementation, which returns the hex string representation of a byte array, we achieved a 15× speedup in the function’s runtime by simply replacing the byte-by-byte conversion with the built-in binascii.hexlify function. For the sv.int2bytes function, which converts any integer into a bytearray representation, we use Python 3’s new bit_length attribute, which supports signed integers up to 64 bits long, along with a 64-bit mask and shift. This allows us to process the input eight bytes at a time instead of one, yielding a 1.5× speedup in the function’s runtime compared to the reference implementation. Overall, these optimizations produced a 5.75% reduction in runtime, mostly in the proof and verify phases. 4.3.4 Introducing Parallelism No performance analysis is complete without an analysis of the potential for parallelism. Multicore CPUs are now the norm; even a typical laptop that we would expect an election official to use is likely to have at least two CPU cores. Furthermore, in our system, the computation required to mix the votes and construct a proof of the election outcome is distributed across the nine mixservers. We show here that significant parallelism can be leveraged in each of the mix, proof, and verification phases of the system. Parallelism in the Mix and Proof Phases In the reference implementation of the mix and proof phases, the controller server issued remote procedure calls (RPCs) to each mixserver sequentially. Each RPC would block on the controller until the corresponding mixserver’s RPC response arrived at the controller; only then would the next RPC be issued. Because the RPCs are independent within each part of the proof and verify phases, they may be issued asynchronously and in parallel. We use Python’s implementation of processes in the multiprocessing library to accomplish this parallelization. For each RPC, one process is forked from the pricess running the controller server. The forked processes are responsible for issuing the RPCs, waiting for responses, and returning to the parent process. Even with a single-core controller server, a process that is waiting for an RPC response may yield the CPU to allow another process to 44 issue an RPC. Processes are used instead of threads because Python’s Global Interpreter Lock limits parallelism between the various threads spawned from a single process. We confirmed that using Python threads instead of processes yielded no noticeable speedup over the reference implementation. The built-in multiprocessing library does not support process pools as it does thread pools, so we must fork a new process for each RPC instead of relying on existing workers. Fortunately, forking a new process is a constant-time cost relative to scaling the number of voters and our tests show that it is negligible. Introducing parallelism produced a 12.1 second speedup (66.1% reduction in runtime) in the mix phase and 21.5 second speedup (26.6% reduction in runtime) in the proof phase. Parallelism in the Verification Phase Unlike the mix and proof phases which involve many distinct machines, the verification phase occurs on a single machine. However, all parts of the verification process depend only on the contents of the secure bulletin board; we expect that any member of the public may take the contents of the secure bulletin board and verify the election outcome herself. Although no part of the verification phase depends on any previous part because each part may independently read the secure bulletin board, in practice it is easier to let earlier phases populate a shared data structure for later phases to use. This prevents the need to deserialize the secure bulletin board, which may be a large file and thus a costly process, at the cost of some parallelism. In the reference implementation, later parts of the verification phase used data that earlier parts had already verified. This makes verification simpler because at no point does the verifier need to worry about inconsistent or unverified data propagating through it. However, this scheme resulted in a complex data dependency graph which made the verifier difficult to parallelize; a process running one part of the verification could be started only when all of its dependencies were completed. The requirement that data consumed in later parts must itself be verified is stricter than necessary. If any step of the verifier fails, we declare the election to be invalid, so even if bad data propagates through the verifier, at some point that data will itself be checked and an error raised. This check might not happen until after the bad data has been used in other parts of the verifier, but the problem will be caught nonetheless. 45 Relaxing our requirements on the verifier and introducing parallelism resulted in a 8 second speedup (44.5% reduction in runtime) in the verification phase. 4.3.5 JSON Serialization The results presented in [21] suggest that simplejson often outperforms Python’s native JSON library, especially when deserializing lists of dictionaries. By simply switching the serializer library and eliminating indentation and formatting from JSON strings, a 41.5 second speedup (70% reduction in runtime) in the proof phase and 2.9 second speedup (28.8% reduction in runtime) in the verify phase were achieved. The overall runtime speedup for the simulation was 45 seconds (55.8% reduction in runtime). 4.3.6 Hashing the Secure Bulletin Board Hashes of the secure bulletin board are used in generating the cut-and-choose and left-right challenges during the construction of the proof. The hash values are recomputed in the verification phase to ensure that the values used in the proof are actually correct hashes of the secure bulletin board. Because the verification process itself makes modifications to the secure bulletin board data structure, causing the hash values to differ, the reference implementation makes a deep copy of the data structure before proceeding with verification. By moving the hashing verification to be before the verification of the rest of the data structure, a 1.3 second speedup (18.3% reduction in runtime) was obtained in the verification. In the reference implementation, the mixservers ask the server storing the secure bulletin board for its hash value for the aforementioned reasons. Although the mixservers operate in parallel, the secure bulletin board runs on a single process and requests for the hash value are serialized. Therefore, a mixserver that issues its request later must wait for the secure bulletin board to service all earlier requests. In our optimization, the controller server requests the hash value and sends it to each mixserver as part of the body of the relevant requests. Each mixserver then has all the necessary information at the beginning of each phase and wastes no time waiting for the single-threaded secure bulletin board to service other requests. This optimization resulted in a 11.7 second speedup (67.5% reduction in runtime) in the proof phase. Each request for the hash value of the secure bulletin board requires that the contents 46 of data structures held in memory be serialized before they can be hashed. As the secure bulletin board grows, this serialization becomes expensive. We can reduce this cost by computing the updated hash value every time anything is appended to the bulletin board. We calculate the new hash value by concatenating the data to be appended with the previous hash value. Because the previous hash value is fixed in size, the cost to compute the hash does not grow as the size of the secure bulletin board. While this results in a relatively small performance gain in the proof phase (11.9% reduction in runtime), it prevents performance from diminishing as the size of the secure bulletin board increases. 4.4 Performance Scalability In the previous section, we simulated elections with 1000 voters because it provided useful data for optimization without taking too long to run simulations. A real election, however, may have hundreds of thousands or even millions of voters. We are interested in how our system scales with the number of voters. Number of Voters 100 500 1000 5000 10000 Total Time (s) 2.48 10.93 21.47 142.71 271.88 Time Per Voter (s) 0.0248 0.0219 0.0215 0.0285 0.0212 Table 4.5: Runtimes, in seconds, of the major phases of a simulated election (including mix, proof, tally, and verify) for varying numbers of voters. All timings reflect the average of five simulations. We observe that the runtime scales linearly as the number of voters. Table 4.5 shows the performance of the system with 100, 500, 1000, 5000, and 10,000 voters and the corresponding costs per voter. We see that the performance of the system scales roughly linearly as the number of voters, which bodes well for scaling the system to production. In the current prototype and the testing machine, connections begin to time out for 100,000 voters. This is likely an issue with the pool of TCP connections becoming saturated since the simulation is running on a single operating system. However, this remains to be tested rigorously. Table 4.6 shows the performance of the system with the number of repetitions (2𝑚) set to 4, 8, 16, and 24. Although the runtime grows slightly sublinearly as the number of repetitions, reducing the number of repetitions is one way of significantly reducing the 47 Number of Repetitions (2𝑚) 4 8 16 24 Total Time (s) 4.01 7.84 15.17 21.47 Time Per Repetition (s) 1.0029 0.9801 0.9478 0.8947 Table 4.6: Runtimes, in seconds, of the major phases of a simulated election (including mix, proof, tally, and verify) for varying numbers of repetitions (value of 2𝑚). All timings reflect the average of five simulations. system’s runtime. 4.5 Comparison with Other Systems Unfortunately, there exists very little data concerning the performance of end-to-end verifiable electronic voting schemes in general and mixnets in particular. In the original design of the Helios system by Adida, proof generation and verification of a 500 voter election took many hours [2]. However, Adida notes the potential for using parallelism to achieve speedups. Furthermore, we must keep in mind that computers have become much more powerful since the original design of Helios in 2008: as a rough metric, the cost per GFLOPS (one billion floating-point operations per second) has decreased more than 600× in the last seven years [38], while browser implementations of Javascript have continued to improve. The original design of Scantegrity [9] claims that the “necessary Switchboard audit data” could be produced and verified in a matter of minutes for an election with 32 races and 200,000 ballots. However, it is unclear if this is simply a proof of an honest Switchboard in the election setup phase or if this time includes a proof and verification of the overall election’s outcome. Mixnet designs such as those proposed in [6] achieve improvements in the asymptotic amount of work done in constructing a proof; however, it is unclear what the performance of a prototype implementation would be. 4.6 Conclusion In this section, we examined the performance of the split-value system implementation. Benchmarking functions available for cryptographic primitives led to the usage of the SHA256 hash function and the HMAC commitment scheme in our implementation. We used the 48 Python profiler to identify areas of the implementation that could yield large performance gains. By leveraging parallelism, reducing the wait time on sockets and requests to the secure bulletin board, and using a more efficient JSON serializer, our optimized system runs in less than one-tenth the time of the reference implementation. With the optimized implementation, an election of one million voters will take between seven and eight hours to mix, prove, tally, and verify. This is a bit more than twice as long as the time generally allotted between the closing of polling sites and the release of election results in the evening news. We hope that further optimization may yield this factor of two to three improvement in the runtime. Using properly hardware-optimized AES encryption as a commitment scheme has the potential to reduce runtime significantly; according to Intel’s measurements, the estimate of 8 million commitments per second given in [32] can be achieved even with conservative estimates. In the current implementation, considerable time is spent on repeating the mixing and proof construction for 2𝑚 = 24 repetitions. As mentioned in section 3.2.2, this repetition is necessary due to the reduced security in using a Fiat-Shamir hash of the secure bulletin board instead of true randomness. The process may be sped up at least twofold if election officials adopted a true source of randomness during the construction of the proof. Overall, it appears that under ideal situations (access to hardware functions, minimal cost in serialization and communication), the split-value system delivers on its promise to provide security without the heavy computation used in most existing cryptographic solutions. However, its implementation requires careful optimization and selection of thirdparty code. 49 50 Chapter 5 Future Work The work of this thesis, building upon [29], establishes a baseline for the speed that can be expected of a system built using the split-value design. Additional work may build upon the optimizations described in this thesis to provide further performance gains or experiment with using other languages and platforms to compare them to our Python implementation. Work is also being done to improve the design by supporting additional fault tolerance. Finally, a real-life deployed version of this system must also include mechanisms for secure channels and key exchange as part of the election setup. 5.1 Performance Considerations While the work in this thesis produced a 10× speedup over the reference prototype, there is good potential for even more performance gains. For example, the results in section 4.1.1 suggest that our commitment scheme may become significantly faster by writing code with native hardware support. Given the current implementation, improvements to performance may be guided by the following goals. ∙ Construct a more accurate model for where runtime is going. Studying the profiler output for each mixserver is a good first step; even more informative would be a way to visualize the amount of time spent by each mixserver in each parallelized mix or proof phase. Another metric of interest is the duration of time that each server spends waiting for other servers and the remote procedure calls that cause such waiting to occur. 51 ∙ Obtain accurate numbers on the amount of time spent in crypto-related operations, versus data serialization and message passing. Throughout most of the optimization process, the actual “number-crunching” of the crypto-related operations consumed far less time than the overhead of data serialization and message passing. After numerous optimizations, the two components appear relatively even in runtime. However, experiments that attempt to more accurately measure this (for example, by replacing a crypto function with a fast, constant time statement like return 0) were inconclusive. Accurate results here will provide valuable insight on the maximum benefit that can be derived from optimizing each component of the system. ∙ Compare the performance of the split-value system to that of comparable systems. Detailed performance numbers are unfortunately not available in most of the electronic voting and mixnet literature. However, it may be informative to profile existing implementations of systems such as [34, 2] and implement mixnet schemes designed for efficiency such as [6] for comparison. Python is often criticized for being a slow language; statically-checked, strongly-typed languages compiled directly into machine code may perform much better and take advantage of extensions such as CilkPlus for C/C++. On the other hand, the potential for performance gains by using a language like C++ needs to be weighed against the increases in code complexity and difficulty in reading and debugging. Python’s C bindings may also be used for specific functions that could benefit from great optimization while leaving the majority of the codebase intact. 5.2 Fault Tolerance In a production environment, servers (which may be election officials’ laptops) may crash, power may go out, and an election official might even forget to bring her laptop. Data may be corrupted in transit or on disk. A server may be ejected from the mixnet if it is discovered to be dishonest. A robust, fault-tolerant system is needed to ensure that the election can still run smoothly despite these failures. Possible improvements to fault tolerance include the following. ∙ Replicate and persist the secure bulletin board to disk. The secure bulletin board ul52 timately contains all of the information generated by the mixservers for the mix and proof phases of the election. This information should be replicated and persisted to disk so that no single server failure causes this information to be lost. ∙ Remove central points of failure, such as the controller. Because the controller dictates the progression of the entire election and all other servers take instructions from it, a failure of the controller will cause the entire system to stop. The controller may be designed as a replicated state machine, or it may be removed altogether in a system where the mixservers collectively communicate among each other and advance the election. ∙ Ensure that no data is corrupted during transmission. This may be accomplished via message authentication codes or digital signatures, which also provide security guarantees as described in the next section. Work by Danna Kelmer at NYU begins to address fault tolerance in the system. 5.3 Secure Channels and Key Exchange Ultimately, the system must run on multiple machines connected to some network. Despite precautions to ensure the integrity of the network (e.g. private network, wired connections only with wireless adapters off), cryptographically secure channels are a necessary and relatively simple way to ensure end-to-end encryption and authentication. The various public-key and symmetric encryption and authentication functions should be compared for their security and performance characteristics, and the process of key exchange should be designed to be done when the election is setup the day before. 53 54 Chapter 6 Conclusion In this thesis, we present a new design for an end-to-end verifiable voting system by introducing the concept of a split-value representation and a split-value commitment. We also introduce a verifiable mixnet design, which when used with split-value commitments, preserves the anonymity of voters. Compared to other designs which achieve the same result, we expect that using relatively inexpensive split-value commitments in the mixnet rather than other, more time-consuming operations such as repeated public-key encryption will yield high performance while maintaining verifiability. As with any candidate system for real-world application, the split-value voting system must exhibit not only correctness, but also good performance, scalability, and fault tolerance. Based on the performance optimizations and scalability analysis described in this thesis, the potential for additional performance gains, and current work addressing fault tolerance, it is our hope that the split-value voting system may evolve from a theoretically interesting design to a viable implementation used in public elections over the coming years. 55 56 Bibliography [1] B. Adida. Advances in Cryptographic Voting Systems. PhD dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, August 2006. [2] B. Adida. Helios: Web-based Open-audit Voting. In Proceedings of the 17th Conference on Security Symposium, 2008. [3] S. Bell et al. STAR-Vote: A Secure, Transparent, Auditable, and Reliable Voting System. In USENIX Journal of Election Technology and Systems, volume 1, 2013. [4] J. Benaloh. Simple Verifiable Elections. In Proc. 2006 USENIX/Accurate Electronic Voting Technology Workshop, 2006. [5] J. Benaloh and D. Tuinstra. Receipt-free secret-ballot elections. In Proc. 26th ACM Symposium on Theory of Computing, 1994. [6] M. Chase, M. Kohlweiss, A. Lysyanskaya, and S. Meiklejohn. Verifiable Elections That Scale for Free. In K. Kurosawa and G. Hanaoka, editors, Lecture Notes in Computer Science, volume 7778. Springer, 2013. [7] D. Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM, 24(2), 1981. [8] D. Chaum. Secret-ballot receipts. In Proc. 25th IEEE Symposium on Security and Privacy, 2004. [9] D. Chaum et al. Scantegrity: End-to-end voter-verifiable optical-scan voting. Security & Privacy, IEEE, 6(3), 2008. [10] D. Chaum and T. Pedersen. Wallet Databases with Observers. In Proceedings on Advances in Cryptology - Crypto ’92, 1992. [11] A. Feldman, J. Halderman, and E. Felten. Security Analysis of the Diebold AccuVoteTS Voting Machine. In Proc. 2007 USENIX/Accurate Electronic Voting Technology Workshop, 2007. [12] A. Fiat and A. Shamir. How to Prove Yourself: Practical Solutions to Identification and Signature Problems. In Proceedings on Advances in Cryptology - Crypto ’86, 1987. [13] The Python Software Foundation. Miscellaneous operating system interfaces. https: //docs.python.org/3.4/library/os.html. [Online; accessed 18-May-2015]. 57 [14] The Python Software Foundation. socket - Low-level networking interface. https: //docs.python.org/3.4/library/socket.html. [Online; accessed 18-May-2015]. [15] T. El Gamal. A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms. IEEE Trans. Inform. Theory, 31, 1985. [16] R. Gennaro, S. Jarecki, H. Krawczyk, and T. Rabin. Secure distributed key generation for discrete-log based cryptosystems. In J. Stern, editor, Lecture Notes in Computer Science, volume 1592. Springer, 1999. [17] Rop Gonggrijp et al. Nedap/Groenendaal ES3B voting computer: a security analysis, 2006. R [18] S. Gueron. Intel○ Advanced Encryption Standard (AES) New Instructions Set. https://software.intel.com/sites/default/files/article/165683/ aes-wp-2012-09-22-v01.pdf, 2012. [Online; accessed 18-May-2015]. [19] M. Jakobsson, A. Juels, and R. Rivest. Making mix nets robust for electronic voting by randomized partial checking. In D. Boneh, editor, USENIX Security Symposium. USENIX, 2002. [20] A. Juels, D. Catalano, and M. Jakobsson. Coercion-resistant electronic elections. In Proc. ACM Workshop on Privacy in the Electronic Society, 2005. [21] Jyotiska. JSON vs simpleJSON vs ultraJSON. http://blog.dataweave.in/post/ 87589606893/json-vs-simplejson-vs-ultrajson. [Online; accessed 20-May-2015]. [22] S. Khazaei and D. Wikström. Randomized Partial Checking Revisited. In E. Dawson, editor, Lecture Notes in Computer Science, volume 7779. Springer, 2013. [23] R. Mercuri. Voting-machine risks. Commun. ACM, 35(11), 1992. [24] L. Naish. Partial Disclosure of Votes in STV Elections. Voting Matters, 30, 2013. [25] C. Andrew Neff. Practical high certainty intent verification for encrypted votes. Votehere, Inc. whitepaper, 2004. [26] Ace Electoral Knowledge Network. Results Management Systems. aceproject.org/ace-en/pdf/vc/view. [Online; accessed 18-May-2015]. http:// [27] C. Park, K. Itoh, and K. Kurosawa. Efficient anonymous channel and all/nothing election scheme. In T. Helleseth, editor, Lecture Notes in Computer Science, volume 765. Springer, 1994. [28] T. Pedersen. A threshold cryptosystem without a trusted party (extended abstract). In D. Davies, editor, Lecture Notes in Computer Science, volume 547. Springer, 1991. [29] M. Pedroso. The Implementation of a Split-Value Verifiable Voting System. Master’s thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. [30] B. Pfitzmann. Breaking efficient anonymous channel. In A. De Santis, editor, Lecture Notes in Computer Science, volume 950. Springer, 1995. 58 [31] B. Pfitzmann and A. Pfitzmann. How to break the direct rsa-implementation of mixes. In J. Quisquater and J. Vandewalle, editors, Lecture Notes in Computer Science, volume 434. Springer, 1990. [32] M. Rabin and R. Rivest. Efficient End to End Verifiable Electronic Voting Employing Split Value Representations. In Proc. EVOTE 2014, 2014. [33] K. Sako and J. Kilian. Receipt-free mix-type voting scheme - a practical solution to the implementation of a voting booth. In L. Guillou and J. Quisquater, editors, Lecture Notes in Computer Science, volume 921. Springer, 1995. [34] D. Sandler, K. Derr, and D. Wallach. VoteBox: A tamper-evident, verifiable electronic voting system. In Proceedings of the 17th USENIX Security Symposium, 2008. [35] A. Shamir. How to share a secret. Commun. ACM, 22(11), 1979. [36] Wikipedia. DRE voting machine. http://en.wikipedia.org/wiki/DRE_voting_ machine. [Online; accessed 18-May-2015]. [37] Wikipedia. Electronic voting. http://en.wikipedia.org/wiki/Electronic_voting. [Online; accessed 18-May-2015]. [38] Wikipedia. FLOPS. http://en.wikipedia.org/wiki/FLOPS#Cost_of_computing. [Online; accessed 18-May-2015]. [39] Wikipedia. Secret ballot. http://en.wikipedia.org/wiki/Secret_ballot. [Online; accessed 18-May-2015]. [40] Wikipedia. United States presidential election in Florida, 2000. http://en.wikipedia. org/wiki/United_States_presidential_election_in_Florida,_2000. [Online; accessed 18-May-2015]. 59