Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative Study Kris Gaj, Sashisu Bajracharya, Chang Shu, Sang Han George Mason University kgaj@gmu.edu, sbajrach@gmu.edu, cshu@gmu.edu Tarek El-Ghazawi The George Washington University tarek@gwu.edu Reconfigurable Computers are high-end computers based on the close system-level integration of traditional microprocessors and Field Programmable Gate Arrays (FPGAs). Public key cryptography is particularly suitable for implementation on FPGAs rather than traditional microprocessors because of the need for computationally intensive arithmetic operations with unconventionally long operand sizes of several hundreds or even thousands of bits. Elliptic Curve Cryptosystems (ECCs) are a family of public cryptosystems that has emerged over the last ten years as a transformation of choice for use in future communication networks. In this paper, we present and contrast two fundamentally different families of Elliptic Curve Cryptosystems from the point of view of their suitability for implementation on a reconfigurable computer. Both families are based on operations in the Galois Fields GF(2m) with m in the range from 160 to 512 bits. They differ in the way the operands are represented, and the way of defining multiplication of two components of the Galois Field GF(2m). Our goal is to determine which of the two possible Galois Field representations: Polynomial Basis or Optimal Normal Basis (ONB) is more suitable for an implementation on a reconfigurable computer. This suitability is determined in terms of both an absolute execution time, as well as in terms of the speed-up compared to a purely microprocessor-based implementation. As a platform for our experiments, we have chosen one of the first general-purpose, stand-alone reconfigurable computers available on the market, SRC-6E from SRC Computers Inc. This machine allows an application to be executed on two User FPGAs Xilinx Virtex II XC2V6000, and two microprocessors P3 with 1 GHz clock. The first tentative results of the implementation of both classes of Elliptic Curve Cryptosystems using SRC-6E have been reported in our earlier publications [1, 2]. While these publications gave the first rough estimate of the speed-up that can be achieved using both approaches, they used different implementation approaches and optimizations, and as a result were not very suitable for comparison. In this paper, our attempt is to implement both classes of Elliptic Curve Cryptosystems using very similar techniques and optimizations, and allow differences only for operations that are fundamentally different in both investigated representations. In particular, the following ECC operations and the ways of their implementation are common for both Galois Field representations: 1. scalar multiplication performed using Montgomery Scalar Multiplication with Projective Coordinates [3], 2. Elliptic Curve addition and doubling performed in Projective Coordinates [3], 3. Transformation from Projective Coordinates to Affine Coordinates. The two operations that are specific to a given representation are: 1. Galois Field multiplication and squaring, and 2. Galois Field inversion. Since the optimal ways of performing these operations are substantially different in each representation, an effort has been made to devote the similar amount of time and effort to their optimizations. The limitation in both cases comes from the limit on the maximum clock frequency, which is set to 100 MHz, and is fixed in the SRC architecture. Based on the tentative results, we predict that our implementations will result in similar absolute execution times for both Galois Field representations, and that the speedup compared to a microprocessor-based implementation will be substantially higher for the Optimal Normal Basis representation. This speed up has been estimated to be in the range from 895 to 1300 depending on the chosen algorithm description partitioning scheme (the amount of code written in VHDL vs. C). While earlier publications (e.g., [4]) regarding implementations of cryptography on reconfigurable computers have already proven the capability of accomplishing a 1000x speed-up compared to the microprocessor-based implementations in terms of the data throughput, this is a first publication that shows a comparable speed-up for data latency. This speed-up is even more remarkable taking into account that the selected operation has only limited amount of intrinsic parallelism, and cannot be easily sped up by multiple instantiations of the same computational unit. References 1. Nguyen N., Gaj K., Caliga D., El-Ghazawi T., “Implementation of Elliptic Curve Cryptosystems on a Reconfigurable Computer,” IEEE International Conference on FieldProgrammable Technology, FPT 2003, Tokyo, Japan, Dec. 2003. 2. Bajracharya, S., Shu, C., Gaj, K., El-Ghazawi, K., “Implementation of Elliptic Curve Cryptosystems over GF(2n) in Optimal Normal Basis on a Reconfigurable Computer,” 14th International Conference on Field Programmable Logic and Applications, FPL 2004, Antwerp, Belgium, Aug.-Sep. 2004 (in print). 3. López, J., and Dahab, R.: “Fast Multiplication on Elliptic Curves over GF(2m) without precomputation,” CHES’99, LNCS 1717, (1999) 4. Fidanci O. D., Poznanovic D., Gaj K., El-Ghazawi K., and Alexandridis N., "Performance and Overhead in a Hybrid Reconfigurable Computer," Reconfigurable Architecture Workshop, RAW 2003, Nice, France, Apr. 2003.