A new SPA-resistant and fast Scalar Multiplication over Binary

advertisement
A new SPA-resistant and fast Scalar Multiplication over Binary
Elliptic Curves
Sen Xua, DaWu Gua, Zheng Guoa, HaiHua Gua,b, JunRong Liua, WeiJia Wanga
a School Of Electronic information and Electrical Engineering, ShangHai JiaoTong University
{xusen0328, dwgu }@sjtu.edu.cn pandaguo_wow@163.com
{guhaihua,liujr,aawwjaa}@sjtu.edu.cn
b Shanghai HuaHong Integrated Circuit Co., Ltd.
Abstract
A new SPA-resistant and fast scalar multiplication algorithm is proposed in this paper. We employ
octal representation of scalar and optimize two formulas, 2𝑃1 + 2𝑃2 and 4𝑃1. Also, we introduce
new composite formulas based on 2𝑃1 + 2𝑃2 and 4𝑃1. These new composite formulas are 8𝑃1
7𝑃1 + 𝑃2 6𝑃1 + 2𝑃2, 5𝑃1 + 3𝑃2 and 4𝑃1 + 4𝑃2 computing by x-coordinate only. We obtain an
unified mathematic form of key elements of these new formulas. With two adjacent formulas
combining side-channel atomicity, we form 4 identical units sharing same mathematic structure.
The new algorithm is introduced based on the former units which are atomic naturally with low
computational overburden, only one dummy squaring operation. Then, the old atomic block is
optimized in the same way as well. We get better performance by saving two storages, seven
operations, one dummy operation at least (4 at most) and one pre-computation. We merge two
different atomic blocks into one with respect to different bit of quaternary scalar. This can also be
implanted into the identical unit. As a result, mathematical structure of composite formulas gets
the atomicity naturally. Our proposed algorithm is 12% faster than quaternary Montgomery ladder
algorithm.
Key words: ECC, Scalar Multiplication , SPA-resistant, side channel attack
Introduction
Koblitz and Miller [1,2] introduced Elliptic Curve Cryptography (ECC) independently in
1985. ECC can get high level of security with small key size. This merit is popular with
low-resource devices such as smart cards. Scalar multiplication is the core operation of ECC and
many researchers are dedicated to enhancing it in both efficiency and security.
In terms of efficiency, many mainstream methods have been proposed to make scalar
multiplication faster. The first one is to decrease the number of elliptic curve basic operations such
as windows – based methods and comb – based using signed representation of the scalar and table
look-up [3,4]. The second method is to optimize basic curve operations themselves. For example,
new coordinates, LD Projective coordinate [5], Jacobian projective coordinate, and composite
formulas, such as 2𝑃1 + 𝑃2 and 3𝑃1[6,7]. Other methods are to combine the former two methods
to get better performance, such as [8,9] using double-based chain and representing the scalar as
Fibonacci numbers respectively to get efficient elliptic curve scalar multiplication.
Adversary can break a careless implementation of ECC easily though side channel
information [10]. This attack is called side Channel Attack. After Side channel attack occurring,
almost all secure method proposed is to protect scalar multiplication against it, especially, simple
power analysis (SPA). SPA does works though distinguishing the different pattern of point
addition and point doubling from power consumptions or computing timings [11,12]. A typical
method against SPA is to make the scalar multiplication a fixed pattern. For example,
Montgomery ladder [13] and multiplication always methods [14]. Multiplication always methods
needs dummy operations, which is slower than Montgomery ladder method. Another typical
method is to make the points operations indistinguishable, for example the indistinguishable
operations [15] and atomicity method [16]. They make the point addition and point doubling same
patterns and numbers of fields operations, so the adversary can’t tell the difference between
different point operations though SPA.
2013, A fast and SPA-resistant comprehensive methods combining Montgomery trick [18],
new form of scalar, composite formulas, and atomicity based on x-coordinate only Montgomery
ladder algorithm has been proposed in [17]. The main idea is to represent scalar as quaternary
form to reduce loops, computing new composite formulas 4𝑃1 , 2𝑃1 + 2𝑃2 and 3𝑃1 + 𝑃2. Based on
these new formulas, the paper constructs atomic blocks (4𝑃1, 3𝑃1 + 𝑃2 ) and (3𝑃1 + 𝑃2, 2𝑃1 +
2𝑃2 ) with two dummy fields operations at least. This algorithm is at least 26% faster than
previous algorithms.
Can we get better performance with same principle? We represent scalar as octal form and
optimize composite formulas 4𝑃1 and2𝑃1 + 2𝑃2 . Also, we propose new composite formulas
based on new optimized formulas, such as 8𝑃1 7𝑃1 + 𝑃2 6𝑃1 + 2𝑃2, 5𝑃1 + 3𝑃2 and 4𝑃1 + 4𝑃2.
The proposed composite formulas can utilize the x-coordinate-only in affine coordinates system.
We apply side-channel atomicity to our composite operations only one dummy filed operations
(squaring) at most. The proposed new algorithm is SPA-resistant and 12% faster than [17]. Two
storages, 7 fields operations, at least one dummy operation are saved in atomic block by using our
new algorithm. Our algorithm also doesn’t need to compute 2P in advance compare with [17].
The remainder of the paper is organized as follows. Section 2 presents a brief summary of
elliptic curves over binary fields and the SPA-resistant extended quaternary Montgomery ladder
algorithm. Section 3 we optimize two basic composite formulas and propose our new composite
formulas. Then we describe our atomic block, the identical unit, with the structure of new
composite formulas naturedly inserting only one dummy operation at most. Using optimized basic
formulas we improve the previous atomic block proposed in [17]. In Section 4, we analyze the
computational cost comparing with previous proposed algorithm. Section 5 is conclusion.
2 Preliminary
2.1 Elliptic curve cryptosystem and Scalar multiplication
A non-supersingular elliptic curve E over GF(2𝑚 ) is defined by Weierstrass equation [1]
𝒚𝟐 + 𝐱𝐲 = 𝒙𝟑 + 𝒂𝒙𝟐 + 𝒃
(1)
𝑚
Where a and b ∈ GF(2 ) , b ≠ 0, together with the point at infinity defined by 𝒪. So, all
the points including infinity point 𝒪 form a commutative finite group. Point addition and point
doubling are basic points operations on the group.
Given 𝑃 = (𝑥1 , 𝑦1 ) point doubling formula given by
𝑏
𝑥3 = 𝑥12 + 𝑥 2
1
{
𝑦
2
𝑦3 = 𝑥1 + (𝑥1 + 𝑥1 ) ∙ 𝑥2 + 𝑥2
(2)
1
The computational cost of the formula is 1I+2M+1S, where I ,M and S are field inversion,
multiplication and squaring respectively. Give 𝑃1 = (𝑥1 , 𝑦1 ) and 𝑃2 = (𝑥2 , 𝑦2 ), point addition
formula as follows
𝑥4 = (𝑥1 +
𝑦1 2
)
𝑥1
+ (𝑥1 +
𝑦1
)+𝑎
𝑥1
{
𝑦
𝑦4 = 𝑥12 + (𝑥1 + 𝑥1 ) ∙ 𝑥2 + 𝑥2
(3)
1
Point addition formula cost same as point doubling, 1I+2M+1S. A classic way to compute the
scalar multiplication is the left-to-right binary method using binary representation of scalar,
described in algorithm 1.
Algorithm1 (The Original Montgomery Ladder Algorithm)
Input
𝐝 = 𝒅𝒏−𝟏 𝟐𝒏−𝟏 + 𝒅𝒏−𝟐 𝟐𝒏−𝟐 + ⋯ + 𝒅𝟎 where 𝒅𝒏−𝟏 = 𝟏
Output
dP
𝑸[𝟎] = 𝑷
For i=n-2 down to 0 do
𝑸[𝟎] = 𝟐𝑸[𝟎]
if 𝒅𝒊 == 𝟏
𝑸[𝟎] = 𝑸[𝟎] + 𝑷
endif
End for
Return 𝑸[𝟎]
Though simple power analysis, adversary can distinguish different point operations from
algorithm 1 to extract scalar. In security view, Montgomery ladder algorithm [13] has been
proposed to protect scalar multiplication against SPA. This method shows the same pattern
regardless of the key bit.
2.2 SPA-resistant quaternary Montgomery Ladder Algorithm
Because the slow performance of Montgomery ladder algorithm, Lopze and Dahab [5]
proposed the x-coordinate-only Montgomery ladder Algorithm in 1999. Then several researchers
introduced several similar algorithm [13,21]. This method utilize x coordinate only to realize point
operation. Given 𝑃1 = (𝑥1 , 𝑦1 ) , 𝑃2 = (𝑥2 , 𝑦2 ) and 𝑃 = (𝑥0 , 𝑦0 ) be points on curve E. if there
is equation 𝑃2 = 𝑃1 + 𝑃, then we get x-coordinate-only point addition and point doubling as
follows
𝑏
𝑥2𝑃1 = 𝑥12 + 𝑥 2
1
{
𝑥𝑃1 +𝑃2 = 𝑥0 +
𝑥1
𝑥1 +𝑥2
+(
𝑥1
𝑥1 +𝑥2
2
(4)
)
The Montgomery ladder algorithm with x-coordinate-only method is described in algorithm
2.
Algorithm 2 (The SPA-resistant extended quaternary Montgomery Ladder Algorithm)
Input
𝐝 = 𝒅𝒏−𝟏 𝟐𝒏−𝟏 + 𝒅𝒏−𝟐 𝟐𝒏−𝟐 + ⋯ + 𝒅𝟎 where 𝒅𝒏−𝟏 = 𝟏
Output
dP
𝑸[𝟎] = 𝑷
𝑸[𝟏] = 𝟐𝑷
For i=n-2 down to 0 do
𝑸[𝟏 − 𝒅𝒊 ] = 𝑸[𝟏 − 𝒅𝒊 ] + 𝑸[𝒅𝒊 ]
𝑸[𝒅𝒊 ] = 𝟐𝑸[𝒅𝒊 ]
End for
Return 𝑸[𝟎]
In order to get better performance, A fast SPA-resistant quaternary Montgomery ladder
algorithm based on the x-coordinate-only Montgomery ladder algorithm has been proposed in
2013[17]. This paper shorten the number of loops though quaternary form of scalar. Different
representation of scalar, new formulas must be calculated. Suppose we know 𝑃2 𝑃1, 𝑃 and 2𝑃,
where 𝑃2 = 𝑃1 + 𝑃. Then three composite formulas with x-coordinate-only are given as follows
4𝑃1 = 𝑥14 +
𝑏 2 (𝑥14 +𝑏)2 +𝑏𝑥18
(5)
𝑥14 (𝑥14 +𝑏)2
2𝑃1 + 2𝑃2 = 𝑥2𝑃 + 𝜆 + 𝜆2 where 𝜆 =
𝑥12 (𝑥14 +𝑏)
2
(𝑥1 𝑥22 +𝑏)(𝑥1 +𝑥2 )2
(6)
.
The formula cost 1I+4M+5S.
3𝑃1 + 𝑃2 = 𝑥0 + 𝜆 + 𝜆2 where 𝜆 =
(𝑥14 +𝑏)(𝑥1 +𝑥2 )2
2
𝑥1 [𝑥0 (𝑥1 +𝑥2 )2 +𝑥1 𝑥2 ]+(𝑥14 +𝑏)(𝑥1 +𝑥2 )2
.
(7)
The formula cost 1I+5M+4S.
We can get two atomic blocks (4𝑃1 , 3𝑃1 + 𝑃2 ) and (3𝑃1 + 𝑃2 , 2𝑃1 + 2𝑃2 ) by combining
two composited formulas into an atomic block. These atomic blocks reduce the duplicated
operations in composited formulas. And Montgomery trick is employed in the atomic block to
reduce number of inversion. Both methods can reduce the computational cost. Then the atomic
block shows complete same operation sequence that we can’t distinguish which block is executed.
Montgomery trick is an effective way to diminish one inversion with 3M. Give a and b , if
we want to get 𝑎−1 and 𝑏−1 , the Montgomery trick calculates 𝑎𝑏−1 first, then get 𝑎−1 =
𝑏(𝑎𝑏)−1 and 𝑏−1 = 𝑎(𝑎𝑏)−1 with two multiplication. The SPA-resistant quaternary
Montgomery algorithm and Montgomery trick depicted in Algorithm 3
Algorithm 3 (The SPA-resistant extended quaternary Montgomery Algorithm)
Input
𝐝 = 𝒅𝒏−𝟏 𝟒𝒏−𝟏 + 𝒅𝒏−𝟐 𝟒𝒏−𝟐 + ⋯ + 𝒅𝟎
Output
dP
𝑸[𝟎] = 𝒅𝒏−𝟏 𝑷
𝑸[𝟏] = (𝒅𝒏−𝟏 + 𝟏)𝑷
For i=n-2 down to 0 do
̅̅̅̅
𝑯
(𝑸[𝟎] , 𝑸[𝟏] ) = 𝑨𝑬𝑪𝑻𝑫𝒙 (𝑸[𝒅𝑯
𝒊 ], 𝑸[𝒅𝒊 ])
End for
Return 𝑸[𝟎]
𝑯
Deonte 𝒅𝑯𝒊 as the high bit of the 𝒅𝑖 , and ̅̅̅̅
𝒅𝐻
𝒊 as the inversion of 𝒅𝒊 . These values can
control the position of parameters considering 𝒅𝑖 =3 or 4. 𝑨𝑬𝑪𝑻𝑫𝒙 is executed in the atomic
block. The block uses four dummy operations at most, 9 storages and 36 cycles per loop.
According to computational analysis, this algorithm is faster than SPA-resistant scalar
multiplication previous proposed. Moreover, even faster than many unprotected algorithm except
Multi-base method introduced in [19]. So, algorithm 3 is effective.
Our work is inspired by algorithm 3 to get better performance.
3 Fast SPA resistance algorithm
In this section, we proposed a new scalar multiplication algorithm with better performance
than [17]. We represent scalar in octal form, and use new composite formulas to make our
algorithm better.
3.1 Extended Montgomery ladder algorithm
We make the scalar shorter with the octal representation of scalar. According to the
quaternary method, we can summarize five distinct formulas in mathematics. These five formulas
can be denoted as follows.
8𝑄𝑖 [0] + 0𝑄𝑖 [1] and 0𝑄𝑖 [0] + 8𝑄𝑖 [1]
7𝑄𝑖 [0] + 1𝑄𝑖 [1] and 1𝑄𝑖 [0] + 7𝑄𝑖 [1]
6𝑄𝑖 [0] + 2𝑄𝑖 [1] and 2𝑄𝑖 [0] + 6𝑄𝑖 [1]
5𝑄𝑖 [0] + 3𝑄𝑖 [1] and 3𝑄𝑖 [0] + 5𝑄𝑖 [1]
4𝑄𝑖 [0] + 4𝑄𝑖 [1] and 4𝑄𝑖 [0] + 4𝑄𝑖 [1]
share the same formula ECEAZP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥
share the same formula ECSAOP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥
share the same formula ECSATP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥
share the same formula ECFATP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥
share the same formula ECFAFP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥
Let d be a positive integer with the octal representation as
d = 𝑑𝑛−1 8𝑛−1 + 𝑑𝑛−2 8𝑛−2 + ⋯ + 𝑑0 where 𝑑𝑖 = 0,1,2,3,4,5,6,7
(8)
𝑖
𝑖−𝑘
[0]
∑
[1]
[0]
[1],
We define 𝑄𝑖 = 𝑘=1 𝑑𝑛−𝑘 8
and 𝑄𝑖
= 𝑄𝑖
+ 𝑃 . Then, 𝑄𝑖+1
𝑄𝑖+1 [0] are
computed with 𝑄𝑖 [1], 𝑄𝑖 [0] depending on 𝑑𝑛−𝑖−1 ,as follows
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECEAZP, 𝐸CSAOP(𝑄𝑖 [0], 𝑄𝑖 [1])
𝑖𝑓 𝑑𝑖 = 0
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = 𝐸CSAOP, ECSATP(𝑄𝑖 [0], 𝑄𝑖 [1])
𝑖𝑓 𝑑𝑖 = 1
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECSATP, ECFATP(𝑄𝑖 [0], 𝑄𝑖 [1])
𝑖𝑓 𝑑𝑖 = 2
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECFATP, ECFAFP(𝑄𝑖 [0], 𝑄𝑖 [1])
𝑖𝑓 𝑑𝑖 = 3
(9)
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECFATP, ECFAFP(𝑄𝑖 [1], 𝑄𝑖 [0])
𝑖𝑓 𝑑𝑖 = 4
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECFATP, ECFAFP(𝑄𝑖 [1], 𝑄𝑖 [0])
𝑖𝑓 𝑑𝑖 = 5
(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = 𝐸CSAOP, ECSATP(𝑄𝑖 [1], 𝑄𝑖 [0])
𝑖𝑓 𝑑𝑖 = 6
𝑖𝑓 𝑑𝑖 = 7
{(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECEAZP, 𝐸CSAOP(𝑄𝑖 [1], 𝑄𝑖 [0])
We can get four identical units, and they are (𝐄𝐂𝐄𝐀𝐙𝐏, 𝑬𝐂𝐒𝐀𝐎𝐏) , (𝐄𝐂𝐄𝐀𝐙𝐏, 𝑬𝐂𝐒𝐀𝐎𝐏) ,
(𝐄𝐂𝐒𝐀𝐎𝐏, 𝑬𝐂𝐅𝐀𝐓𝐏) and (𝑬𝐂𝐅𝐀𝐓𝐏, 𝑬𝐂𝐅𝐀𝐅𝐏) Extended octal algorithm shows as follows
Algorithm 4 (The extended octal Montgomery algorithm)
Input
𝐝 = 𝒅𝒏−𝟏 𝟖𝒏−𝟏 + 𝒅𝒏−𝟐 𝟖𝒏−𝟐 + ⋯ + 𝒅𝟎
Output
dP
𝑸[𝟎] = 𝒅𝒏−𝟏 𝑷
𝑸[𝟏] = (𝒅𝒏−𝟏 + 𝟏)𝑷
For i=n-2 down to 0 do
If 𝒅𝒏−𝟏 == 𝟎 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟎], 𝑸[𝟏])
Else if 𝒅𝒏−𝟏 == 𝟏 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟎], 𝑸[𝟏])
else if 𝒅𝒏−𝟏 == 𝟐 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝐄𝐂𝐒𝐀𝐎𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟎], 𝑸[𝟏])
else if 𝒅𝒏−𝟏 == 𝟑 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐅𝐀𝐅𝐏(𝑸[𝟎], 𝑸[𝟏])
else if 𝒅𝒏−𝟏 == 𝟒 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝑬𝐂𝐅𝐀𝐅𝐏(𝑸[𝟏], 𝑸[𝟐]), 𝑸[𝟏] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟏], 𝑸[𝟎])
else if 𝒅𝒏−𝟏 == 𝟓 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟏], 𝑸[𝟎]), 𝑸[𝟏] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟏], 𝑸[𝟎])
else if 𝒅𝒏−𝟏 == 𝟔 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟏], 𝑸[𝟐]), 𝑸[𝟏] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟏], 𝑸[𝟎])
else if 𝒅𝒏−𝟏 == 𝟕 𝒕𝒉𝒆𝒏
𝑸[𝟐] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟏], 𝑸[𝟎]), 𝑸[𝟏] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟏], 𝑸[𝟎])
End if
End for
Return 𝑸[𝟎]
This algorithm is insecure to SPA and beyond our expectation for performance. We have to
recalculate the composited formulas to decrease duplicated operations, and combining
Montgomery trick to get better performance to get identical units. We use these formulas to get
elliptic curve scalar multiplication atomically. And atomic method can bring us computational
reduction, especially the filed reversion and duplicated operations.
3.2 New composite formulas using x-coordinates-only
This section we improve two composite formulas 2𝑃1 + 2𝑃2 and4𝑃1. to construct the other
five composite formulas ECEAZP, ECSAOP, ECSATP, ECFATP, and ECFAFP. We optimize two
formulas, 4P and 2𝑃1 + 2𝑃2 in corollary 1 and 2, and then show five new composite formulas.
Corollary 1 Let E be an binary elliptic curve defined over GF(2m) and P1  ( x1 , y1 ) be a point
on E. Then, the following formula holds
x4 P1 
( x14  b)4  bx18
The formula costs 1I+3M+5S.
x14 ( x14  b)2
(10)
Proof.
Since we have already get x4P1 from section 2, then we can get another form from it, as
follows:
x4 P1  x14 

b2 ( x14  b)2  bx18 x18 ( x14  b)2  b2 ( x14  b)2  bx18

x14 ( x14  b)2
x14 ( x14  b)2
( x18  b2 )( x14  b)2  bx18 ( x14  b)4  bx18
 4 4
x14 ( x14  b)2
x1 ( x1  b)2
(11)
As a result, new form obtained and the cost is 1I+3M+5S.
We can denote 𝑥4𝑃1 as another form
𝑥4𝑃1 = 𝐾⁄𝑀
where K  ( x1  b)  bx1
4
4
8
M  x14 ( x1 4  b)
2
(12)
Corollary 2
Let E be an binary elliptic curve defined over GF(2m), P1  ( x1 , y1 ) ,
P2  ( x2 , y2 ) ,
P  ( x0 , y0 ) and be points on E, when P2  P1  P the following formula
holds
x2( P1  P2 )
[ x0 ( x1  x2 )2  x1 x2 ]4  b( x1  x2 )8

with cost 1I+5M+4S
( x1  x2 )4 [ x0 ( x1  x2 )2  x1 x2 ]2
Proof.
Since x2 P1  x1 
2
b
b
2
then we have x2( P1  P2 )  x P  P  2
2
1 2
x1
x P P
1
2
2
Since we have already obtained xP1  P2
can get
(13)
 x2 
x0 ( x1  x2 ) 2  x1 x2
x2
 x0 


. we

x1  x2  x1  x2 
( x1  x2 )2
4
 x0 ( x1  x2 ) 2  x1 x2 
2

 b
2
(
x

x
)
 x0 ( x1  x2 ) 2  x1 x2 
b
1
2



 
2
2
2
2
2
(
x

x
)
 x0 ( x1  x2 )  x1 x2 

1
2
  x0 ( x1  x2 )  x1 x2 




( x1  x2 ) 2
( x1  x2 ) 2




(14)
So we obtain new formula as following
x2( P1  P2 ) 
[ x0 ( x1  x2 )2  x1 x2 ]4  b( x1  x2 )8
∎
( x1  x2 )4 [ x0 ( x1  x2 )2  x1 x2 ]2
(15)
This formula costs 1I+5M+4S.
We describe new formula x2( P1  P2 ) as a simple form
x2 P1  2 P2  R / Q where
(16)
Q  ( x1  x2 ) 4 [ x0 ( x1  x2 ) 2  x1 x2 ]2
(17)
R  [ x0 ( x1  x2 ) 2  x1 x2 ]4  b( x1  x2 )8
(18)
Theorem 1
Let E be an binary elliptic curve defined over GF(2m) and P1  ( x1 , y1 ) be a point
on E. Then, the following formula holds
ECEAZP(𝑃1, 𝑃2 )𝑥 =
x8 P1 
[( x14  b)4  bx18 ]4  b[ x14 ( x14  b) 2 ]4
[ x14 ( x14  b)2 ]2 [( x14  b)4  bx18 ]2
with
cost
1I+5M+8S
(19)
Proof. Compute x8 P1  2(4 P1 ) , then we can compute it with following formula:
x8 P1  x 42P 
1
b
x 42P
(20)
1
Since we have obtain [inference 1]
x4 P1 
( x14  b)4  bx18
x14 ( x14  b)2
Then use doubling formula, we can gain the new formula
x8 P1
[( x14  b)4  bx18 ]4  b[ x14 ( x14  b) 2 ]4

∎
[ x14 ( x14  b)2 ]2 [( x14  b)4  bx18 ]2
We describe this formula in a simple form:
ECEAZP(𝑃1, 𝑃2 )𝑥 
K 4  bM 4
M 2K 2
(21)
Theorem 2 Let E be an binary elliptic curve defined over GF(2m), P1  ( x1 , y1 ) , P2  ( x2 , y2 ) ,
P  ( x0 , y0 ) and be points on E, when P2  P1  P the following formula holds
ECSAOP(𝑃1 , 𝑃2 )𝑥 = x7 P1  P2  x0     2 where

KT 2
.
KT 2  x14 ( x14  b) 2 ( x0T 2  LT  L2 )
With cost 1I+11M+9S
Proof. We can use 4 P1  (3P1  P2 ) to get this new formula. According to addition formula, we
can get
x4 P1
x7 P1  P2  x0 
𝜆3𝑃1 +𝑃2 = 𝑥 2 [𝑥
1
we can get

x4 P1  x3 P1  P2

x4 P1

 x4 P  x3 P  P
 1
1
2



2
(𝑥14 +𝑏)(𝑥1 +𝑥2 )2
4
2
2
0 (𝑥1 +𝑥2 ) +𝑥1 𝑥2 ]+(𝑥1 +𝑏)(𝑥1 +𝑥2 )
, 3𝑃1 + 𝑃2 = 𝑥0 + 𝜆3𝑃1 +𝑃2 + 𝜆23𝑃1+𝑃2 where
𝐿
=𝑇

x4 P1
x4 P1  x3P1  P2

KT 2
∎
KT 2  x14 ( x14  b)2 ( x0T 2  LT  L2 )
We describe this formula in a simple form:


KT 2
KT 2

ECSAOP(𝑃1 , 𝑃2 )𝑥  x0 


2
2
KT  MN  KT  MN 
2
(22)
where N  x0T  LT  L
2
2
Theorem 3 Let E be an binary elliptic curve defined over GF(2m), P1  ( x1 , y1 ) , P2  ( x2 , y2 ) ,
P  ( x0 , y0 ) and be points on E, when P2  P1  P the following formula holds
ECSATP(𝑃1 , 𝑃2 )𝑥 = x6 P1  2 P2 
[ x0T 2  LT  L2 ]4  bT 8
With cost 1I+9M+7S.
T 4 [ x0T 2  LT  L2 ]2
Proof.
Since x3 P1  P2 
x0T 2  LT  L2
T2
Then x6 P1  2 P2  x2(3 P1  P2 )  x ( 3 P  P ) 
b
2
1
2
2
x( 3 P P )
1
. With doubling formula, we have
2
4
 x0T 2  LT  L2 
2

 b
T2
 x0T 2  LT  L2 
b




 
2
2
2
2 2
2
T
 x0T  LT  L2 

  x0T  LT  L 




T2
T2




(23)

[ x0T 2  LT  L2 ]4  bT 8
∎
T 4 [ x0T 2  LT  L2 ]2
As a result, this formula cost 1I++9M+7S. Then, we describe this formula in a simple form:
ECSADP(𝑃1 , 𝑃2 )𝑥 
N 4  bT 8
2
2
where N  x0T  LT  L
T 4N 2
(24)
Theorem 4 Let E be an binary elliptic curve defined over GF(2m), P1  ( x1 , y1 ) , P2  ( x2 , y2 ) ,
P  ( x0 , y0 ) and be points on E, when P2  P1  P the following formula holds
ECFATP(𝑃1, 𝑃2 )𝑥 = x5 P1 3P2  x0     2

where
T 2Q
. The
T 2Q  R[ x0T 2  LT  L2 ]
formula cost 1I+11M+8S.
Proof. Divide 5P1  3P2 into two parts: (2 P1  2 P2 ) and (3P1  P2 ) , difference of two parts is
P  ( x0 , y0 ) then we have
5P1  3P2  (2 P1  2 P2 )  (3P1  P2 )  x0 
x2 P1  2 P2
x2 P1  2 P2  x3 P1  P2

x2 P1  2 P2

 x2 P  2 P  x3 P  P
 1 2
1
2



2
(25)
Though doubling addition in section 2 we can get
x2( P1  P2 ) 
[ x0 ( x1  x2 )2  x1 x2 ]4  b( x1  x2 )8
( x1  x2 )4 [ x0 ( x1  x2 )2  x1 x2 ]2
And this formula can be described as
Then compute
 as following
R
T 2R
Q
∎


R x0T 2  LT  L2 T 2 R  Q[ x0T 2  LT  L2 ]

Q
T2
x2 P1  2 P2
x2 P1  2 P1  x3 P1  P2
(26)
We describe this formula in a simple form:
 T 2R 
T 2R
 2
ECFATP(𝑃1, 𝑃2 )𝑥  x0  2

T R  QN  T R  QN 
2
(27)
where N  x0T  LT  L
2
2
Theorem 5 Let E be an binary elliptic curve defined over GF(2m), P1  ( x1 , y1 ) , P2  ( x2 , y2 ) ,
P  ( x0 , y0 ) and be points on E, when P2  P1  P the following formula holds
x4 P1  4 P2 
Q 4  bR 4
4
2
2
where Q  ( x1  x2 ) [ x0 ( x1  x2 )  x1 x2 ] and
2 2
Q R
R  [ x0 ( x1  x2 ) 2  x1 x2 ]4  b( x1  x2 )8 . The formula cost 1I+7M+8S
Proof.
We can get 4 P1  4 P2 by doubling 2 P1  2 P2 . Then, we can compute this formula by
using doubling formula. Deonte x2( P1  P2 ) 
R
, then
Q
4
x2(2 P1  2 P2 )
R
2
Q  b
R
b
R 4  bQ 4
 
. ∎
  


2
2
R 2Q 2
R
Q  R
Q
Q
 
 
(28)
3.3 Fast SPA resistant Algorithm
Aiming to construct scalar multiplication, each iteration of the algorithm is atomic that the
process of different sequence is indistinguishable. We call the atomic sequence identical unit.
Representing these new formulas into a simpler form, we find some interesting thing. Then the
five key elements in new formulas can be seen as follows


K 4  bM 4  
KT 2
N 4  b(T 2 )4 
x
,
x
,
x
,
 8 P1
  7 P 1P
  6 P 2 P

M 2 K 2   1 2 KT 2  MN   1 2 (T 2 )2 N 2 

(29)

T 2R  
R 4  bQ 4 
 x5 P1 3 P2 , 2
  x4 P1  4 P2 ,

T R  QN  
R 2Q 2 

(30)
Obviously, there are two structures, add structure and doubling structure, among these key
elements. We form two formulas into one identical unit and these identical units are listed in
Equs.[9]. Combining atomicity, we make each unit perform same operation sequence. According
to simple form of formulas, the computational sequences of unit are shown in Fig 1.
T and N
M,K
R,Q
8P1
7P1P2
6P1 2P2
5P1 3P2
7P1P2
6P1 2P2
5P1 3P2
4P1 4P2
Fig 1. Computational process
We make comparison between M, K and R, Q. We find that RQ cost one more squaring than
M,K in computational consumption after the T and N, which is 2M+4S. Because we can use some
intermediate values after T N. That means the two branches cost same after finishing MK with one
dummy operation and RQ. And each processing unit get same structure as seen before, so total
cost of each unit is same, 1I+17M+13S.
This computational consumption can be divided into several parts. TN costs 6M+3S. KM
costs 2M+3S, PQ costs 2M+4S based on TN. Both structures cost 4M+4S. Montgomery trick
costs 3M. Obtaining final result cost 2M+2S. Worth to mention, each identical unit perfectly
matches requiring only one dummy operation during computational processing.
As a result, we get a total perfect new formulas in mathematic form based on optimized
formulas. Moreover, we obtain identical unit with low additional computational overburden.
We denote identical unit as 𝐹𝑆𝑃𝐴𝑅𝐴, fast SPA resistance algorithm over binary elliptic curves.
𝐻
̅̅̅̅
Give 𝑑𝑖 , we denote higher bit and lower bit as 𝑑 𝐻 and 𝑑
𝑖 respectively. 𝐹𝑆𝑃𝐴𝑅 expressed as
𝑖
follows.
𝐻
̅̅̅̅
(𝟖 − 𝒅𝒊 )𝑄[𝑑𝑖𝐻 ] + 𝒅𝒊 𝑄[𝑑
𝑖 ]
𝐻
̅̅̅̅
𝐹𝑆𝑃𝐴𝑅𝐴(𝑄[𝑑𝑖𝐻 ], 𝑄[𝑑
𝑖 ]) {
𝐻
̅̅̅̅
(𝟕 − 𝒅𝒊 )𝑄[𝑑𝑖𝐻 ] + (𝒅𝒊 + 1)𝑄[𝑑
𝑖 ]
(31)
Proposed spa resistant and fast algorithm described in Algorithm 5.
Algorithm 5
SPA-resistant and fast scalar multiplication
Input
𝐝 = 𝒅𝒏−𝟏 𝟖𝒏−𝟏 + 𝒅𝒏−𝟐 𝟖𝒏−𝟐 + ⋯ + 𝒅𝟎
Output
dP
𝑸[𝟎] = 𝒅𝒏−𝟏 𝑷
𝑸[𝟏] = (𝒅𝒏−𝟏 + 𝟏)𝑷
For i=n-2 down to 0 do
̅̅̅̅
𝑯
(𝑸[𝟎] , 𝑸[𝟏] ) = 𝑭𝑺𝑷𝑨𝑹(𝑸[𝒅𝑯
𝒊 ], 𝑸[𝒅𝒊 ])
End for
Return 𝑸[𝟎]
The security of algorithm 5 originates from the identical unit. Table1 describe the identical unit
of ECEAZP, 𝐸CSAOP and 𝐸CSAOP, ECSATP . Table2 describe the identical unit of
ECSATP, ECFATP and ECFATP, ECFAFP.
Table 1 Identical unit procedure for elliptic curves over GF(2m)
Input: 𝐓𝟏 = 𝑷𝟏 = 𝒙𝟏
𝐓𝟐 = 𝑷𝟐 = 𝒙𝟐
𝐓𝟑 = 𝑷 = 𝒙𝟎
Output: (𝐓𝟏 𝐓𝟐 ) ← (𝟖𝑷𝟏 , 𝟕𝑷𝟏 + 𝑷𝟐 ) or (𝐓𝟏 𝐓𝟐 ) ← (𝟕𝑷𝟏 + 𝑷𝟐 , 𝟔𝑷𝟏 + 𝟐𝑷𝟐 )
T4 = 𝑇1 + 𝑇2 (𝑥1 + 𝑥2 )
𝐓𝟒 = 𝑻𝟏 + 𝑻𝟐 (𝒙𝟏 + 𝒙𝟐 )
𝐓𝟒 =
𝑻𝟐𝟒
T4 = 𝑇42 ((𝑥1 + 𝑥2 )2)
𝟐
((𝒙𝟏 + 𝒙𝟐 ) )
𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐)
T5 = T3 ∙ T4 (𝑥0 (𝑥1 + 𝑥2 )2 )
𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟐 (𝒙𝟏 𝒙𝟐)
T6 = T1 ∙ T2 (𝑥1 𝑥2)
𝐓𝟓 = 𝐓𝟓 + 𝐓𝟔 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 ) + 𝒙𝟏 𝒙𝟐 )
T5 = T5 + T6 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2)
𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟐𝟏 )
T1 = 𝑇12 (𝑥12)
𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟓 (𝒙𝟐𝟏 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 ))
T2 = T1 ∙ T5 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))
𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟒𝟏 )
T1 = 𝑇12 (𝑥14)
𝐓𝟕 = 𝐛
T7 = b
𝐓𝟔 = 𝐓𝟏 + 𝐓𝟕 (𝒙𝟒𝟏 + 𝒃)
T6 = T1 + T7 (𝑥14 + 𝑏)
𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳: (𝒙𝟒𝟏 + 𝒃)(𝒙𝟏 + 𝒙𝟐 )𝟐)
T4 = T4 ∙ T6 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 )
𝟐
𝟐
𝐓𝟓 = 𝐓𝟐 ∙ 𝐓𝟒 (𝐋𝐓 + 𝑳 )
T5 = T2 ∙ T4 (LT + 𝐿2 )
𝐓𝟒 = 𝐓𝟐 + 𝐓𝟒 (𝑻)
T4 = T2 + T4 (𝑇)
𝐓𝟒 = 𝑻𝟐𝟒 (𝑻𝟐)
T4 = 𝑇42 (𝑇 2)
𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 𝑻𝟐)
T5 = T3 ∙ T4 (𝑥0 𝑇 2)
𝟐
𝐓𝟓 = 𝑻𝟓 + 𝑻𝟐 (𝑵: 𝒙𝟎 𝑻 + 𝑳 + 𝑳𝑻)
T5 = 𝑇5 + 𝑇2 (𝑁: 𝑥0 𝑇 2 + 𝐿2 + 𝐿𝑇)
𝐓𝟔 = 𝑻𝟐𝟔 ((𝒙𝟒𝟏 + 𝒃)𝟐)
T6 = 𝑇62 ((𝑥14 + 𝑏)2 )
𝐓𝟐 = 𝑻𝟐𝟐 (𝐃𝐮𝐦𝐦𝐲 )
T2 = 𝑇22 (Dummy)
𝐓𝟐 = 𝑻𝟏 ∙ 𝑻𝟔 (𝐌: 𝒙𝟒𝟏 (𝒙𝟒𝟏 + 𝒃)𝟐 )
T2 = 𝑇1 ∙ 𝑇6 (M: 𝑥14 (𝑥14 + 𝑏)2 )
𝐓𝟓 = 𝐓𝟐 ∙ 𝐓𝟓 (𝐌𝐍)
T2 = T2 ∙ T5 (MN)
𝐓𝟏 =
𝑻𝟐𝟏
𝟐
(𝒙𝟖𝟏 )
T1 = 𝑇12 (𝑥18)
𝐓𝟔 = 𝑻𝟐𝟔 ((𝒙𝟒𝟏 + 𝒃)𝟒)
T6 = 𝑇62 ((𝑥14 + 𝑏)4 )
𝐓𝟏 = 𝑻𝟏 ∙ 𝑻𝟑 (𝒃𝒙𝟖𝟏)
T1 = 𝑇1 ∙ 𝑇3 (𝑏𝑥18)
𝐓𝟔 = 𝐓𝟏 + 𝐓𝟔 (𝐊: (𝒙𝟒𝟏 + 𝒃)𝟒 + 𝒃𝒙𝟖𝟏 )
T6 = T1 + T6 (K: (𝑥14 + 𝑏)4 + 𝑏𝑥18)
𝐓𝟒 = 𝐓𝟔 ∙ 𝐓𝟒 (𝑲𝑻𝟐)
T6 = T6 ∙ T4 (𝐾𝑇 2 )
𝟐
𝐓𝟓 = 𝐓𝟓 + 𝐓𝟒 (𝑨: 𝑲𝑻 + 𝑴𝑵)
T2 = T6 + T4 (𝐴: 𝐾𝑇 2 + 𝑀𝑁)
𝐓𝟐 = 𝑻𝟐𝟐 (𝑴𝟐)
T4 = 𝑇42 (𝑇 4)
𝐓𝟔 = 𝑻𝟐𝟔 (𝑲𝟐)
T5 = 𝑇52 (𝑁 2)
𝐓𝟏 = 𝐓𝟐 ∙ 𝐓𝟔 (𝐁: 𝑴𝟐 𝑲𝟐)
T1 = T4 ∙ T5 (B: 𝑁 2 𝑇 4)
𝐓𝟔 = 𝑻𝟐𝟔 (𝑲𝟒)
T5 = 𝑇52 (𝑁 4)
𝐓𝟐 = 𝑻𝟐𝟐 (𝑴𝟒)
T4 = 𝑇42 (𝑇 8)
𝐓𝟕 = 𝑻𝟕 ∙ 𝐓𝟐 (𝒃𝑴𝟒)
T7 = 𝑇7 ∙ T4 (𝑏𝑇 8)
𝐓𝟕 = 𝐓𝟕 + 𝐓𝟔 (𝒃𝑴𝟒 + 𝑲𝟒)
T7 = T7 + T4 (𝑏𝑇 8 + 𝑁 4)
𝐓𝟐 = 𝑻𝟏 ∙ 𝐓𝟓 (𝐀𝐁)
T4 = 𝑇1 ∙ T2 (AB)
𝐓𝟐 = 𝑻−𝟏
((𝐀𝐁)−𝟏)
𝟐
T4 = 𝑇4−1 ((AB)−1)
−𝟏
𝐓𝟔 = 𝐓𝟐 ∙ 𝐓𝟓 (𝐁 )
T5 = T2 ∙ T4 (B −1)
𝐓𝟓 = 𝐓𝟐 ∙ 𝐓𝟏 (𝐀−𝟏)
T2 = T4 ∙ T1 (A−1)
𝐓𝟏 = 𝐓𝟕 ∙ 𝐓𝟔 (𝟖𝑷𝟏)
T2 = T5 ∙ T7 (6𝑃1 + 2𝑃2 )
𝐓𝟓 = 𝐓𝟓 ∙ 𝐓𝟒 (𝝀 )
T6 = T6 ∙ T2 (𝜆 )
𝐓𝟐 = 𝑻𝟑 + 𝐓𝟓 (𝒙𝟎 + 𝝀 )
T1 = 𝑇3 + T6 (𝑥0 + 𝜆 )
𝐓𝟓 =
𝑻𝟐𝟓
T6 = 𝑇62 (𝜆2)
𝟐
(𝝀 )
𝐓𝟐 = 𝑻𝟐 + 𝐓𝟓 (𝟕𝑷𝟏 + 𝑷𝟐)
T1 = 𝑇1 + T6 (7𝑃1 + 𝑃2 )
Table 2 Identical unit procedure of elliptic curves over GF(2m)
Input: 𝐓𝟏 = 𝑷𝟏 = 𝒙𝟏
𝐓𝟐 = 𝑷𝟐 = 𝒙𝟐
𝐓𝟑 = 𝑷 = 𝒙𝟎
Output: (𝐓𝟏 𝐓𝟐 ) ← (𝟔𝑷𝟏 + 𝟐𝑷𝟐 , 𝟓𝑷𝟏 + 𝟑𝑷𝟐 ) or (𝐓𝟏 𝐓𝟐 ) ← (𝟓𝑷𝟏 + 𝟑𝑷𝟐 , 𝟒𝑷𝟏 + 𝟒𝑷𝟐 )
𝐓𝟒 = 𝑻𝟏 + 𝑻𝟐 (𝒙𝟏 + 𝒙𝟐 )
𝐓𝟒 =
𝑻𝟐𝟒
((𝒙𝟏 + 𝒙𝟐
T4 = 𝑇1 + 𝑇2 (𝑥1 + 𝑥2 )
T4 = 𝑇42 ((𝑥1 + 𝑥2 )2 )
)𝟐 )
𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 )
T5 = T3 ∙ T4 (𝑥0 (𝑥1 + 𝑥2 )2 )
𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟐 (𝒙𝟏 𝒙𝟐 )
T6 = T1 ∙ T2 (𝑥1 𝑥2 )
𝐓𝟓 = 𝐓𝟓 + 𝐓𝟔 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐
)𝟐
+ 𝒙𝟏 𝒙𝟐 )
T5 = T5 + T6 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )
𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟐𝟏 )
T1 = 𝑇12 (𝑥12 )
𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟓 (𝒙𝟐𝟏 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 ))
T2 = T1 ∙ T5 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))
𝐓𝟏 =
𝑻𝟐𝟏
(𝒙𝟒𝟏 )
T1 = 𝑇12 (𝑥14 )
𝐓𝟕 = 𝐛
𝐓𝟔 = 𝐓𝟏 +
T7 = b
𝐓𝟕 (𝒙𝟒𝟏
+ 𝒃)
𝐓𝟔 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳: (𝒙𝟒𝟏 + 𝒃)(𝒙𝟏 + 𝒙𝟐 )𝟐 )
T6 = T1 + T7 (𝑥14 + 𝑏)
T6 = T4 ∙ T6 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 )
𝐓𝟏 = 𝐓𝟐 ∙ 𝐓𝟒 (𝐋𝐓 + 𝑳𝟐 )
T1 = T2 ∙ T4 (LT + 𝐿2 )
𝐓𝟔 = 𝐓𝟐 + 𝐓𝟒 (𝑻)
T6 = T2 + T4 (𝑇)
𝐓𝟔 =
𝑻𝟐𝟔
T6 = 𝑇62 (𝑇 2 )
(𝑻𝟐 )
𝐓𝟐 = 𝐓𝟑 ∙ 𝐓𝟔 (𝒙𝟎 𝑻𝟐 )
T2 = T3 ∙ T1 (𝑥0 𝑇 2 )
𝐓𝟏 = 𝑻𝟏 + 𝑻𝟐 (𝑵: 𝒙𝟎 𝑻𝟐 + 𝑳𝟐 + 𝑳𝑻)
T1 = 𝑇1 + 𝑇2 (𝑁: 𝑥0 𝑇 2 + 𝐿2 + 𝐿𝑇)
𝐓𝟒 = 𝑻𝟐𝟒 ((𝒙𝟏 + 𝒙𝟐 )𝟒 )
T4 = 𝑇42 ((𝑥1 + 𝑥2 )4 )
𝐓𝟓 = 𝑻𝟐𝟓 ((𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )𝟐 )
T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )2 )
𝐓𝟐 = 𝑻𝟒 ∙ 𝑻𝟓 (Q)
T2 = 𝑇4 ∙ 𝑇5 (Q)
𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟒 (𝐐𝐍)
𝐓𝟒 =
𝑻𝟐𝟒
((𝒙𝟏 + 𝒙𝟐
T1 = T1 ∙ T2 (QN)
T4 = 𝑇42 ((𝑥1 + 𝑥2 )8 )
)𝟖 )
𝐓𝟓 = 𝑻𝟐𝟓 ((𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )𝟒 )
T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )4 )
𝐓𝟒 = 𝑻𝟑 ∙ 𝑻𝟒 (𝐛(𝒙𝟏 + 𝒙𝟐 )𝟖 )
T4 = 𝑇3 ∙ 𝑇4 (b(𝑥1 + 𝑥2 )8 )
𝐓𝟒 = 𝐓𝟓 + 𝐓𝟐 (𝐑)
T4 = T5 + T2 (R)
𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔
T6 = T4 ∙ T6 (𝑅𝑇 2 )
(𝑹𝑻𝟐 )
𝐓𝟐 = 𝐓𝟐 + 𝐓𝟔 (𝐀: 𝑹𝑻𝟐 + 𝑸𝑵)
T1 = T1 + T6 (A: 𝑅𝑇 2 + 𝑄𝑁)
𝐓𝟔 = 𝑻𝟐𝟔 (𝑻𝟒 )
T4 = 𝑇42 (𝑅2 )
𝐓𝟏 = 𝑻𝟐𝟏 (𝑵𝟐 )
T2 = 𝑇22 (𝑄 2 )
𝐓𝟓 = 𝐓𝟏 ∙ 𝐓𝟔 (𝐁: 𝑻𝟒 𝑵𝟐 )
T5 = T2 ∙ T4 (B: 𝑇 4 𝑁 2 )
𝐓𝟏 = 𝑻𝟐𝟏 (𝑵𝟒 )
T4 = 𝑇42 (𝑅4 )
𝐓𝟔 = 𝑻𝟐𝟔 (𝑻𝟖 )
T2 = 𝑇22 (𝑄 4 )
𝐓𝟕 = 𝑻𝟕 ∙ 𝐓𝟔 (𝒃𝑻𝟖 )
T7 = 𝑇7 ∙ T2 (𝑏𝑅4 )
𝐓𝟕 = 𝐓𝟕 + 𝐓𝟏 (𝒃𝑻𝟖 + 𝑵𝟒 )
T7 = T7 + T1 (𝑏𝑅4 + 𝑄 4 )
𝐓𝟔 = 𝑻𝟓 ∙ 𝐓𝟐 (𝐀𝐁)
T4 = 𝑇1 ∙ T5 (AB)
𝐓𝟔 =
𝑻−𝟏
𝟔
T4 = 𝑇4−1 ((AB)−1 )
((𝐀𝐁)−𝟏 )
𝐓𝟏 = 𝐓𝟐 ∙ 𝐓𝟔 (𝐁−𝟏 )
T2 = T4 ∙ T1 (B−1 )
𝐓𝟐 = 𝐓𝟓 ∙ 𝐓𝟔 (𝐀−𝟏 )
T1 = T4 ∙ T5 (A−1 )
𝐓𝟏 = 𝐓𝟕 ∙ 𝐓𝟏 (𝟔𝑷𝟏 + 𝟐𝑷𝟐 )
T2 = T2 ∙ T7 (4𝑃1 + 4𝑃2)
𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟐 (𝝀 )
T6 = T6 ∙ T1 (𝜆 )
𝐓𝟐 = 𝑻𝟑 + 𝐓𝟗 (𝒙𝟎 + 𝝀 )
T1 = 𝑇3 + T6 (𝑥0 + 𝜆 )
𝐓𝟒 =
𝑻𝟐𝟒
T6 = 𝑇62 (𝜆2 )
(𝝀𝟐 )
𝐓𝟐 = 𝑻𝟐 + 𝐓𝟒 (𝟓𝑷𝟏 + 𝟑𝑷𝟐)
T1 = T1 + T6 (5𝑃1 + 3𝑃2 )
With new composite formulas, we optimize original atomic block into a new one described in
table 3. We use seven storages, only one dummy operation, and 29 operation sequences. And also,
we don’t need pre-computation of 2P.
Table 3 Optimized atomic block
Input: 𝐓𝟏 = 𝑷𝟏 = 𝒙𝟏
𝐓𝟐 = 𝑷𝟐 = 𝒙𝟐
𝐓𝟑 = 𝑷 = 𝒙𝟎
Output: (𝐓𝟏 𝐓𝟐 ) ← (𝟒𝑷𝟏 , 𝟑𝑷𝟏 + 𝟐𝑷𝟐 ) or (𝐓𝟏 𝐓𝟐 ) ← (𝟑𝑷𝟏 + 𝟐𝑷𝟐 , 𝟐𝑷𝟏 + 𝟐𝑷𝟐 )
𝐓𝟒 = 𝑻𝟏 + 𝑻𝟐 (𝒙𝟏 + 𝒙𝟐 )
T4 = 𝑇1 + 𝑇2 (𝑥1 + 𝑥2 )
𝐓𝟒 = 𝑻𝟐𝟒 ((𝒙𝟏 + 𝒙𝟐 )𝟐 )
T4 = 𝑇42 ((𝑥1 + 𝑥2 )2 )
𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 )
T5 = T3 ∙ T4 (𝑥0 (𝑥1 + 𝑥2 )2 )
𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟐 (𝒙𝟏 𝒙𝟐 )
T6 = T1 ∙ T2 (𝑥1 𝑥2 )
𝐓𝟓 = 𝐓𝟓 + 𝐓𝟔 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐
𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟐𝟏 )
)𝟐
+ 𝒙𝟏 𝒙𝟐 )
T5 = T5 + T6 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )
T1 = 𝑇12 (𝑥12 )
𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟓 (𝒙𝟐𝟏 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 ))
T2 = T1 ∙ T5 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))
𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟒𝟏 )
T1 = 𝑇12 (𝑥14 )
𝐓𝟕 = 𝐛
T7 = b
𝐓𝟔 = 𝐓𝟏 +
𝐓𝟕 (𝒙𝟒𝟏
T6 = T1 + T7 (𝑥14 + 𝑏)
+ 𝒃)
𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳: (𝒙𝟒𝟏 + 𝒃)(𝒙𝟏 + 𝒙𝟐 )𝟐 )
T6 = T4 ∙ T6 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 )
𝐓𝟐 = 𝐓𝟐 + 𝐓𝟒 (𝐓)
T2 = T2 + T4 (T)
𝐓𝟔 =
𝑻𝟐𝟔
((𝒙𝟒𝟏
T4 = 𝑇42 ((𝑥1 + 𝑥2 )4 )
+ 𝒃)𝟐 )
𝐓𝟓 = 𝑻𝟐𝟓 (𝐝𝐮𝐦𝐦𝐲)
T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))2
𝐓𝟓 = 𝑻𝟏 ∙ 𝑻𝟔 (𝐌: 𝒙𝟒𝟏 (𝒙𝟒𝟏 + 𝒃)𝟐 )
T1 = 𝑇5 ∙ 𝑇4 (Q)
𝐓𝟏 =
𝑻𝟐𝟏
(𝒙𝟖𝟏 )
T4 = 𝑇42 ((𝑥1 + 𝑥2 )8 )
𝐓𝟏 = 𝑻𝟏 ∙ 𝑻𝟑 (𝒃𝒙𝟖𝟏 )
T1 = 𝑇1 ∙ T4 (b(𝑥1 + 𝑥2 )8 )
𝐓𝟔 = 𝐓𝟔𝟐 ((𝒙𝟒𝟏 + 𝒃)𝟒)
T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))4
𝐓𝟕 = 𝐓𝟏 + 𝐓𝟔 (K)
T7 = T1 + T5 (R)
𝐓𝟏 = 𝐓𝟓 ∙ 𝐓𝟐 (MT)
T5 = T1 ∙ T2 (QT)
𝐓𝟏 =
𝑻−𝟏
𝟏
(𝑴𝑻
−𝟏
T5 = 𝑇5−1 ((𝑄𝑇)−1 )
)
𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟓 (𝑻−𝟏 )
T4 = T1 ∙ T5 (𝑇 −1 )
𝐓𝟓 = 𝐓𝟏 ∙ 𝐓𝟐 (𝑴−𝟏 )
T1 = T2 ∙ T5 (𝑄 −1 )
𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳𝑻−𝟏 )
T4 = T4 ∙ T6 (𝐿𝑇 −1 )
𝐓𝟏 = 𝐓𝟕 ∙ 𝐓𝟓 (𝑲𝑴−𝟏 𝟒𝑷𝟏 )
T2 = T1 ∙ T5 (𝐾𝑀−1 2𝑃1 + 2𝑃2)
𝐓𝟐 = 𝐓𝟒 + 𝐓𝟑 (𝒙𝟎 + 𝝀)
T4 = T4 + T3 (𝑥0 + 𝜆)
𝐓𝟒 =
𝑻𝟐𝟒
T4 = 𝑇42 (𝜆2 )
(𝝀𝟐 )
𝐓𝟐 = 𝐓𝟒 + 𝐓𝟐 (𝟑𝑷𝟏 + 𝑷𝟐 )
T1 = T4 + T2 (3𝑃1 + 𝑃2 )
New formulas2𝑃1 + 2𝑃2 and 4𝑃1make the atomic blocks same mathematic structure, which
have similar computational process with Fig1. Different branches utilize different address of
storage. Therefore, we can find a way to merge two original atomic blocks into one with respect to
scalar bits. We refer to the new one as unified atomic block which is described in Table4. This
unified atomic block can be guide for both hardware and software implementations.
Table 4 Unified atomic block with respect to scalar bits.
Input: 𝐓𝟐 = 𝑷𝟏 = 𝒙𝟏
𝐓𝟒 = 𝑷𝟐 = 𝒙𝟐
𝐓𝟓 = 𝑷 = 𝒙𝟎
𝑳
m=𝒅𝑯
𝒊 ⨁𝒅𝒊
Output: (𝐓𝟐 𝐓𝟒 ) ← (𝟒𝑷𝟏 , 𝟑𝑷𝟏 + 𝟐𝑷𝟐 ) or (𝐓𝟐 𝐓𝟒 ) ← (𝟑𝑷𝟏 + 𝟐𝑷𝟐 , 𝟐𝑷𝟏 + 𝟐𝑷𝟐 )
𝐓𝟏 = 𝑻𝟐 + 𝑻𝟒
𝐓𝟏 =
𝑻𝟐𝟏
(𝑥1 + 𝑥2 )
((𝑥1 + 𝑥2 )2 )
𝐓𝟑 = 𝐓𝟓 ∙ 𝐓𝟏
(𝑥0 (𝑥1 + 𝑥2 )2 )
𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟐
(𝑥1 𝑥2 )
𝐓𝟑 = 𝐓𝟑 + 𝐓𝟒
(𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )
𝐓𝟐 = 𝑻𝟐𝟐
(𝑥12 )
𝐓𝟒 = 𝐓𝟐 ∙ 𝐓𝟑
(𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))
𝐓𝟐 = 𝑻𝟐𝟐
(𝑥14 )
𝐓𝟔 = 𝐛
T6 = b
𝐓𝟎 = 𝐓𝟐 + 𝐓𝟔
(𝑥14 + 𝑏)
𝐓𝒎
̅ = 𝐓𝟎 ∙ 𝐓𝟏
(𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 )
𝐓𝒎
̅ +𝟐 = 𝐓𝒎
̅ + 𝐓𝟒
(T)
𝐓𝒎 =
𝑻𝟐𝒎
((𝑥14 + 𝑏)2 ) OR(𝑥1 + 𝑥2 )4
𝐓𝟒−𝒎 = 𝑻𝟐𝟒−𝒎
(dummy operation or(𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )2 )
𝐓𝟒 = 𝑻𝒎 ∙ 𝑻𝒎+𝟐
(M or Q)
𝐓𝒎+𝟐 =
𝐓𝒎 =
𝑻𝟐𝒎+𝟐
(𝑥18 or(𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )4
𝟐
𝐓𝒎
((𝑥14 + 𝑏)4 ) or (𝑥1 + 𝑥2 )8
𝐓𝒎
̅ +𝟏 = 𝐓𝒎
̅ +𝟏 ∙ 𝑻𝟔
(𝑏𝑥18 or b(𝑥1 + 𝑥2 )8 )
𝐓𝒎+𝟐 = 𝐓𝒎 + 𝐓𝒎+𝟐
(K or R)
𝐓𝟔 = 𝐓𝟒 ∙ 𝐓𝒎
̅ +𝟐
(MT or QT)
𝐓𝟔 =
𝑻−𝟏
𝟔
(MT)−1 or (QT)−1
𝐓𝒎 = 𝐓𝟒 ∙ 𝐓𝟔
(T)−1
𝐓𝟒 = 𝐓𝟑−𝒎
̅ ∙ 𝐓𝟔
(M)−1 or (Q)−1
𝐓𝒎
̅ = 𝐓𝒎
̅ ∙ 𝐓𝒎
L(T)−1
𝐓(𝒅𝑳𝒊 ≪𝟏)+𝟐 = 𝐓𝟒 ∙ 𝐓𝟐+𝒎
2𝑃1 + 2𝑃2 or 4𝑃1 or 4𝑃2
𝐓𝟓 = 𝐓𝟓 + 𝐓𝒎
̅
(𝑥0 + 𝜆)
𝐓𝒎
̅ =
(𝜆2 )
𝑻𝟐𝒎
̅
(3𝑃1 + 𝑃2 ) or 3𝑃2 + 𝑃1
𝐓(𝒅̅̅̅𝑳̅≪𝟏)+𝟐 = 𝐓𝒎
̅ + 𝐓𝟓
𝒊
4 Computational analysis
This section we make comparison between the proposed fast algorithm and previously
presented algorithms with respect to computational cost. As we all know, inversion is the most
costly operation in field arithmetic. Ref. [17] described the performance of field inversion that is
equal to about 6.67 -10.33 multiplications in binary field when the field size is 163-bit. Generally,
researchers assume that n=160 and I=8M, where n is bit length of scalar d.
Table 5 shows the comparison of total cost of proposed algorithm and previously presented
algorithms. We denote #I and #M as the numbers of inversion and multiplication in listed
algorithms. According to assumption, total cost can be calculated with #I*8+#M.
We compare our algorithm with both SPA-nonresistant and SPA-resistant. We conclude that
the proposed algorithm 4 is the more effective. Moreover, algorithm 4 is faster than unprotected
algorithm such as Multi-base 1 which is faster than algorithm 3 [21], but is 1.09 times slower than
our algorithm. [17] mention that Multi-base 1 [21] can’t employ this method to get better
performance because of huge cost for dummy operation. Among SPA-resistant algorithms, [17] is
effective, but is 1.12 times slower than our algorithm.
Table 5 Comparison of the total cost between proposed algorithm and others (I/M=8).
SPA-nonresistant
SPA-resistant
Algorithm
#I
#M
Total cost
Ratio
Algorithm
#I
#M
Total cost
Ratio
Binary
240
480
2400
1. 54
[12]
318
636
3180
2.36
[𝟐𝟐]𝑵𝑨𝑭
213
426
2130
1.58
[19]
318
318
2862
2.12
[𝟐𝟐]𝟑−𝑵𝑨𝑭
200
400
2000
1.48
[16]
240
480
2400
1.78
[6]
129
787
1819
1.35
[10]
205
410
2050
1.52
[8]
114
789
1701
1.26
[4]
205
410
2050
1.52
[𝟐𝟏]𝑴𝑩𝟏
97
693
1469
1.09
[3]
203
406
2030
1.50
[𝟐𝟏]𝑴𝑩𝟐
113
677
1581
1.17
[17]
80
878
1518
1.12
Algorithm 5
54
918
1350
1
Table6
break even points between proposed algorithm and the other SPA-resistant algorithms
Algorithm
Break even point
Algorithm
Break even point
[10]
1.06
[12]
3.12
[19]
2.27
[20]
3.36
[16]
2.75
[10]
3.36
[3]
3.43
[17]
1.53
Break even point is number of multiplication needed per one inversion defined in [11]. This
value can reflect performance between different algorithm with formula ((#𝑀2 − #𝑀1 )/(#𝐼1 −
#𝐼2 )). Denote #𝑀1 and #𝐼1 as the cost numbers of multiplication and inversion of algorithm A,
and #𝑀1 and #𝐼1 as the cost numbers of multiplication and inversion of algorithm B. If actual
I/M in real implementation is greater than the break even point, then algorithm A is faster than B.
Table 6 shows the values of break even point illustrating that proposed algorithm is faster than the
other algorithms under general the assumption that I/M=8.
Our algorithm inherits merit from [17] and we improve the atomic block proposed in [17].
Algorithm 5 requires two point storages which is less than window-based methods and
comb-based methods. And our algorithm doesn’t require additional storages for points. However,
atomic block in [17] requires 9 storages and 35 cycles during the atomic process. Our optimized
atomic block only requires 7 storages and 28 cycles. Moreover, we reduce 4 dummy operations in
our improved atomic block only with only 1 dummy operation per iteration, as shown in Table 3.
The total savage is 560 operations and 200 dummy operations in average.
We have known that this method to get better performance doesn’t fit for some other
unprotected algorithms, such as Multi-base. However, if we take hexadecimal scalar, is it worth
our effort?
Extended quaternary Montgomery algorithm is 26% faster than previous algorithms such as
windows-based methods and comb-based methods. The proposed algorithm is 12%, about 170M
savage, faster than extended quaternary Montgomery algorithm. We can get downward trend
which shows that savage decreasing when use a new radix with same method.
Extended quaternary algorithm cost 1518M in 80 iterations. There are 54 iterations in octal
form, so we can save 26 iterations. Moreover, iteration can save 19M (1I+11M), so the total
savage is 494M. If the additional computational cost is less than 9M (494/54), then the total cost
of per iteration in octal form is less than 28 M, better performance obtained. Actually, our
algorithm is 25M per iteration.
Same thing to the hexadecimal form, if total cost is less than 32M per iteration we can also
get better performance. However, considering the tradeoff between circuit area and efficiency, the
hexadecimal form is not optimal.
5 Conclusion
We propose a fast SPA-resistant scalar multiplication method based on extended elliptic
curve Montgomery ladder algorithm over binary fields with resistance to SPA. We improve two
composite formulas 2𝑃1 + 2𝑃2 and 4𝑃1 and compute new composite formulas 8𝑃1 7𝑃1 + 𝑃2
6𝑃1 + 2𝑃2, 5𝑃1 + 3𝑃2 and 4𝑃1 + 4𝑃2 to construct four identical units. These identical units share
same mathematic structure which construct new algorithm. Algorithm5 saves at least 12% of
running time compared to previous algorithms such as the fast algorithm 3. We optimize atomic
block to save two storages, 4 dummy operations(at most) and 6 operations per loop. It requires 7
storages, only one dummy squaring and 28 operations per loop. We merge two atomic blocks in
extended quaternary Montgomery ladder algorithm into one atomic block. This new one chooses
storage automatically with different quaternary bits of scalar.
Acknowledgements
This work is supported by the National Program on Key Research Projects of China (No.
2013CB338004)
This work is supported by the National Natural Science Foundation of China (No. 61202372,
61073150,61202371)
References
[1] N. Koblitz, Elliptic curve cryptosystems, Mathematics of Computation 48 (1987) 203–309.
[2] V. Miller, Uses of elliptic curves in cryptography, Advances in Cryptography, CRYPTO’85, LNCS, vol. 218,
Springer-Verlag, 1986.
[3]B. Miller, Securing elliptic curve point multiplication against side-channel attacks, information security, in: G.I.
Davida, Y. Frankel (Eds.), LNCS, vol.2200, Springer-Verlag, 2001, pp. 324–334.
[4]K. Okeya, T. Takagi, The width-w NAF method provides small memory and fast elliptic scalar multiplication
secure against side channel attacks, CT- RSA2003, LNCS, vol. 2612, Springer-Verlag, 2003.
[5] López J, Dahab R. Improved algorithms for elliptic curve arithmetic in GF (2n)[C]//Selected areas in
cryptography. Springer Berlin Heidelberg, 1999: 201-212.
[6] M. Ciet, K. Lauter, M. Joye, P.L. Montgomery, Trading inversions for multiplications in elliptic curve
cryptography, Designs, Codes and Cryptography 39 (2) (2006) 189–206.
[7] K. Eisentrager, K. Lauter, P.L. Montgomery, Fast elliptic curve arithmetic and improved Weil pairing evaluation,
in: M. Joye (Ed.), CT-RSA2003, LNCS,367 vol. 2612, Springer-Verlag, 2003, pp. 343–354.
[8]Dimitrov V, Imbert L, Mishra P K. Efficient and secure elliptic curve point multiplication using double-base
chains[M] Advances in Cryptology-ASIACRYPT 2005. Springer Berlin Heidelberg, 2005: 59-78.
[9] Meloni N. New point addition formulae for ECC applications [M] Arithmetic of Finite Fields. Springer Berlin
Heidelberg, 2007: 189-201.
[10] Ghosh S, Kumar A, Das A, et al. On the implementation of unified arithmetic on binary huff curves [M]
Cryptographic Hardware and Embedded Systems-CHES 2013. Springer Berlin Heidelberg, 2013: 349-364.
[11]P. Kocher, Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and others systems, CRYPTO’96,
LNCS, vol. 1109, Springer-Verlag, 1996.
[12] P. Kocher, Introduction to differential power analysis, Journal of Cryptographic Engineering 1 (1) (2011) 5–
27.
[13] T. Izu, T. Takagi, A fast parallel elliptic curve multiplication resistant against side channel attacks, PKC2002,
LNCS, vol. 2274, Springer-Verlag, 2002.
[14] J. Coron, Resistance against differential power analysis for elliptic curve cryptosystems, CHES’99, LNCS, vol.
1717, Springer-Verlag, 1999.
[15] E. Brier, M. Joye, Weierstrass elliptic curves and side-channel attacks, PKC2002, LNCS, vol. 2274,
Springer-Verlag, 2002.
[16] B. Chevalier-Mames, M. Ciet, M. Joye, Low-cost solutions for preventing simple side-channel
analysis:Side-channel atomicity, IEEE Transactions on Computers 53 (6) (2004) 760–768.
[17] Cho S M, Seo S C, Kim T H, et al. Extended Elliptic Curve Montgomery Ladder Algorithm over Binary Fields
with Resistance to Simple Power Analysis[J]. Information Sciences, 2013.
[18] H. Cohen, Acourse in Computational Algebraic Number Theory, GTM138, Springer-Verlag, New York, 1993.
[19] J. Lopez, R. Dahab, Fast multiplication on elliptic curves over GF(2m) without precomputation, CHES’99,
LNCS, vol. 1717, Springer-Verlag, 1999.
[20] M. Feng, B.B. Zhu, M.Xu, Shipeng Li, Efficient comb elliptic curve multiplication methods resistant to power
analysis <http://eprint.iacr.org/2005/222.ps.gz>, 2005.
[21] P.K. Mishra, V. Dimitrov, Efficient quintuple formulas for elliptic curves and efficient scalar multiplication using
multibase number representation, ISC 2007, LNCS, vol. 4779, Springer, Verlag, 2007.
[22] J.A. Solinas, Efficient arithmetic on Koblitz curves, Designs, Codes and Cryptography 19 (2000) 195–249.
Download