Modulo Multiplicative Inverse Circuit Design

advertisement
Modulo Multiplicative Inverse Circuit Design
Xiaoying Li1
Fuming Sun2
Ehua Wu1,3
1
Department of Computer and Information Science, FST, University of Macau, Macao, China
2
School of Information Engineering, University of Science and Technology, Beijing, China
3
State Key Lab of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China
Email: ya27404@umac.mo, sunfuming@263.net, ehwu@umac.mo
Abstract— In this paper, circuit design of an arithmetic module
applied to cryptography – Modulo Multiplicative Inverse is
presented and implemented using FPGA hardware technology.
This modular arithmetic function contains iterative
computations of division, multiplication and accumulation with
variable loop times. Besides standard HDL programming and
schematic input, Simulink-to-FPGA has been tried as a
different design flow. Experimental results are compared
between different design methods with discussion of their pros
and cons.
Step1. If u=0, then u-1 is set to zero, end; else
Step2. Set initial values as n1=m, n2=u, b1=0 and b2=1;
Step3. Divide n1 by n2 as n1=q*n2+r, get the quotient q
and the remainder r;
Step4. If r ≠ 0, update variables as n1=n2, n2=r, t=b2,
b2=b1-q*b2, b1=t, then go back to Step3;
Step5. If n2 ≠ 1, u-1 does not exist, end; else
Step6. If b2<0, update b2 as b2=b2+m;
I.
INTRODUCTION
With the increasing importance of information security,
research works on cryptography and cipher design [1]
become more and more significant. As the cryptographer’s
mathematics, modular arithmetic, which is also called clock
arithmetic, is the central mathematical concept in
cryptography and used in almost any cipher from Caesar
Cipher to the RSA Cipher. Different from some basic modcalculations, modulo multiplicative inverse is a relatively
complex iterative procedure and time-consuming calculation
with unfixed loop times. In this paper, two different design
flows – HDL-based circuit description and Simulink-toFPGA circuit module design are utilized to implement the
function of modulo multiplicative inverse in FPGA
hardware. This module can be applied to cipher hardware
design as a basic unit or it can be capsulated into a math IP
core [2].
II.
Step7. u-1 = b2, end.
The function u-1= f (u, m) is an iterative procedure of
integer division, multiplication and accumulation. It is
converged on the value of remainder r so that the loop time
is variable to different u. For m=216+1 in our cipher design,
all the input u in the range {0,1,2,…,65536} has its
corresponding mod-multiplicative inverse element u-1 and
the maximum loop time is eighteen. The flowchart of
algorithm is shown in Fig. 1.
MOD-MULTIPLICATIVE INVERSE FUNCTION
A.
Definition of Element
Suppose m is a positive integer, u ∈ {0, 1, 2, …, m-1}, if
there exists u-1 ∈ {0, 1, 2, …, m-1} which satisfies
u · u-1 = 1 mod m
(1)
u is called the multiplicative inverse element of u modulo m.
-1
B. Computational Method
Let’s set m a positive integer, for any u ∈ {0, 1, 2, …, m1}, the procedure of calculating the mod-multiplicative
inverse element, u-1, is as follows
Supported by Research Grant of University of Macau
Figure 1. Flowchart of modulo multiplicative inverse algorithm.
III.
CIRCUIT DESIGN
From the analysis of mod-multiplicative inverse function,
the computation can be separated into two loops. One is the
division loop. m and u are the initial dividend and divisor. In
the subsequent divisions, previous divisor and remainder will
be set as the current dividend and divisor respectively. In
each iteration, the zero value of remainder r terminates the
division loop and determines the convergence of the function.
The other is multiplication and accumulation loop. The input
of this loop comes from the quotient q of divider. Two
temporary variables b1 and b2 are swapped and updated by
multiplication and accumulation each time. The final result
depends on the value of b2 with an offset m if b2 is less than
zero. In this Section, besides standard FPGA development
flow, Simulink-to-FPGA design flow is applied to the circuit
design of mod-multiplicative inverse module.
A. Schematic and HDL-based Circuit Design
As shown in Fig. 2, the circuit structure of modmultiplicative inverse module is mainly composed of two
iterative procedures: div_loop and mulacc_loop. In the
division loop, DFF registers holding dividend n1 and divisor
n2 are controlled by reset and enable signals. est resets initial
values m and u for the first div loop. ed denotes that results
from divider are ready after a long latency. en2 is valid when
n2 is not equal to 1. Once the value of n2 is 1, which means
that the remainder r will be zero, registers of n1 and n2 will
be locked and remain their contents. In the multiplication and
accumulation loop, two DFF registers are used for holding
b1 and b2. Similarly, est resets initial zero and one to b1 and
b2 respectively for the first mul-acc loop. Therefore,
registers in both div_loop and mulacc_loop in Fig. 2
implement multiplexing with latching. Signal em is
generated after ed according to the latency of mulacc
operation. For this loop, reset must be done correctly to
avoid accumulating errors between the continuous inputs.
The timing diagram and control signals are illustrated in Fig.
3. Signal est is accompanying with every input u which starts
the running of module. Signal ed is generated by the counter
of division latency. eout is the enable signal of output ummi
(u-1). In the HDL-based design, both unsigned pipeline
divider and signed parallel multiplier are generated by Xilinx
Core Generator tool. If m is 65537 (216+1), the latency of 17bit unsigned divider with both quotient and remainder is
twenty cycles.
Figure 2.
Circuit structure
Figure 3. Timing diagram and control signals.
B. Simulink-to-FPGA Circuit Design
With the continued growth in complexity of FPGA-based
designs, more flexible, efficient and higher-level design
methodology has become to change the traditional HDLcentric flows. Different from the behavioral or structural
specification in VHDL or Verilog, some higher-level
languages, such as C and Java can be used to describe the
hardware design by software programming familiars.
Higher-level design flow can directly incorporate model
simulation with hardware implementation. Matlab&Simulink
is a well-known tool that allows designers to model a system
at a high-level and is ideal for certain classes of applications,
such as digital signal processing, automotive control,
communication, etc. The algorithm complexity of the design
and the requirement of fast time-to-market drive such kind of
need. To incorporate the good modeling and simulation
functionality of Simulink, major FPGA vendors have
promoted new product, which is combined into Simulink as
specified blocksets. There are two popular ones: Xilinx
System Generator for DSP [3] and Altera DSP Builder [4].
AccelChip [5] also provides a DSP synthesis tool for FPGA.
Those blocksets and tools can implement a full FPGA design
flow from Simulink modeling to simulation to hardware [6,
7]. It can transform Simulink model into synthesizable HDL
code with test bench. Therefore, in this paper, besides the
HDL circuit description method, the Simulink-to-FPGA flow
using Xilinx System Generator tool is also applied to the
mod-multiplicative inverse module design. Top two level
models are shown in Fig. 4 and Fig. 5. Idea of sub-modules
div_loop and mulacc_loop in Fig. 5 is as same as Fig. 2 so
the blocks inside in Simulink are not illustrated in detail.
Figure 4.
Top-level model of Mod-Mul Inv function in Simulink
Figure 5. Second-level model of Mod-Mul Inv function in Simulink
The whole circuit model can be built up directly from
Xilinx System Generator blockset in Simulink. It is easy to
capsulate small modules into sub-system in a hierarchical
way. Multiplex, register, adder/subtractor, constant, and
multiplier are all basic blocks, which can be customized to
different types. CORDIC divider is in the reference blockset.
The type of divider is not very suitable for integer modular
arithmetic. It cannot output remainder and the result of
quotient has computation error for integer division. To
satisfy the computation requirement of CORDIC divider,
number format has to be changed from integer to extended
real number with fractional bits, which increases the latency
of divider. Other blocks have to be built up for remainder
calculation.
C. Experimental Results
The HDL-based circuit design flow is completed with the
Xilinx ISE tool to do synthesis, implementation, place &
route and device programming for the whole cipher design.
Behavioral and post simulation are supported by Mentor
Graphics ModelSim tool. For the arithmetic units, unsigned
pipeline integer divider with both quotient and remainder
output are parameterized and generated by Xilinx Core
Generator tool. Multiplication uses the embedded multiplier
in the hardware. The target FPGA chip is Xilinx Virtex II
xc2v2000. During the Simulink-to-FPGA design flow,
circuit modeling is built up with Simulink basic blocks and
Xilinx specified blocks. Input and output data are combined
with Matlab workspace, which is convenient to convert
number format and debug. The System Generator tool can
generate synthesizable VHDL code for the circuit model
with a complete ISE project including test bench. Importing
the project into ISE or the related design files to other thirdparty tools, the subsequent standard design flow can be
completed.
TABLE I.
Resource and
Speed
SLICES
FLIP FLOPS
LUTS
MULT18X18S
Max. Frequency
Area and speed can be compared from Tab. 1. Due to the
simple circuit structure in Fig. 2, the HDL-based design can
quickly describe the control logic and the optimized divider
and multiplier cores can get to area-efficient and speed-high
performance. The resource consumption in the Simulink-toFPGA flow is much larger than the HDL flow in this case
and the maximum frequency is lower than that. Because the
CORDIC divider in the reference blockset of System
Generator is not a good choice in this module design, a HDL
and Simulink mixed design flow is also adopted, in which
the divider is substituted by an HDL-based IP core. The
Simulink and System Generator well support the mixed
design and HDL co-simulation. The resource consumption
has been greatly reduced with equivalent performance to
HDL-based circuit.
D. Discussion
From the development of FPGA technology, the
methodology challenges the update of various EDA tools.
Based on the standard development flow (Fig. 6), initial
efforts have been transferred to high-level design and
synthesis. There are many conversion tools such as C-toFPGA, Stateflow diagram to VHDL (SF2VHD), Matlab-toFPGA (MATCH). The features of Simulink-to-FPGA flow
can be discussed as follows
•
Friendly graphics interface. Although the schematic
entry is a GUI interface, the Simulink is easier to
organize input data and much convenient to observe
output in many ways.
•
Easy to number format conversion. Double to fixed
point number conversion is parameterized to
functional blocks. But the consistence of data type
must be noticed during the data flow.
•
Flexible modeling and simulation. The design can be
well organized into hierarchical modules and easy to
be combined with other entry method for design
decision and convenient to debug and simulation.
•
Fast time-to-market for DSP development. With the
assistance of specified DSP blocks for FPGA, the
Simulink-to-FPGA flow can greatly shorten the
development cycle from algorithm to hardware. The
arithmetic blocksets might be further reinforced.
PERFORMANCE COMPARISON
Design Flow
HDL-based
Simulink-to-FPGA
Mixed
682
1142
550
1
103Mhz
3573
5600
6082
9
84Mhz
746
1253
568
1
103Mhz
Figure 6. Standard FPGA development flow
As shown in Fig. 7, high-level designs are supported by
more and more EDA vendors. Currently, most methods aim
at synthesizable HDL to follow the standard FPGA
development flow which can also be compatible with other
parts in the whole system. Besides C synthesis, another way
combined with Matlab&Simulink becomes applicable as
well.
applications such as image processing and communication,
more functional blocks will be capsulated into FPGAmapped blocks in the Simulink and it will take on better
performance with the future improvement.
ACKNOWLEDGMENT
The research is supported by the Research Grant of
University of Macau & University PhD Studentship to the
first author.
REFERENCES
[1]
[2]
[3]
[4]
Figure 7. High-level FPGA design flow and tools
[5]
IV.
CONCLUSION
In this paper, a circuit module design of modulo
multiplicative inverse function for cipher has been proposed
and mapped to FPGA hardware by different design flows.
The standard HDL-based design shows good performance
using optimized arithmetic IP cores. The Simulink-to-FPGA
high-level design takes the advantage of good graphics
interface and flexible design choices. For other DSP
[6]
[7]
A. Daly, W. Marnane, “Efficient architectures for implementing
montgomery modular multiplication and RSA modular
exponentiation on reconfigurable logic”, Proceedings of the 2002
ACM/SIGDA tenth international symposium on Field-programmable
gate arrays, Monterey, California, USA, pp.40 – 49, 2002.
D. W. Matula, A. Fit-Florea, M. A. Thornton, “Table Lookup
Structures for Multiplicative Inverses Modulo 2^k“, 17th IEEE
Symposium on Computer Arithmetic (ARITH'05) pp. 156-163, 2005
Xilinx, “Xilinx System Generator”, Version 6.2, Xilinx Inc., USA,
http://www.xilinx.com/ise/optional_prod/system_generator.htm.
Altera,. “Altera DSP Builder”, Version 5.1, Altera Inc, USA,
http://www.altera.com/products/software/products/dsp/dsp-builder.
html.
AccelChip, “Integrating MATLAB Algorithms into FPGA Designs,”
in Xcell Journal, pp.73-75, 2005.
M. A. Shanblatt, B. Foulds, “A Simulink-to-FPGA Implementation
Tool for Enhanced Design Flow”, Proceedings of the 2005 IEEE
International Conference on Microelectronic Systems Education
(MSE'05), pp.89-90, 2005.
M. Haldar, A. Nayak, A. Choudhary, and P. Banerjee, “A System for
Synthesizing Optimized FPGA Hardware from MATLAB,”
Proceedings of the 2001 IEEE/ACM International Conference on
Computer-Aided Design, pp.314-319, 2001.
Download