A new SPA-resistant and fast Scalar Multiplication over Binary Elliptic Curves Sen Xua, DaWu Gua, Zheng Guoa, HaiHua Gua,b, JunRong Liua, WeiJia Wanga a School Of Electronic information and Electrical Engineering, ShangHai JiaoTong University {xusen0328, dwgu }@sjtu.edu.cn pandaguo_wow@163.com {guhaihua,liujr,aawwjaa}@sjtu.edu.cn b Shanghai HuaHong Integrated Circuit Co., Ltd. Abstract A new SPA-resistant and fast scalar multiplication algorithm is proposed in this paper. We employ octal representation of scalar and optimize two formulas, 2𝑃1 + 2𝑃2 and 4𝑃1. Also, we introduce new composite formulas based on 2𝑃1 + 2𝑃2 and 4𝑃1. These new composite formulas are 8𝑃1 7𝑃1 + 𝑃2 6𝑃1 + 2𝑃2, 5𝑃1 + 3𝑃2 and 4𝑃1 + 4𝑃2 computing by x-coordinate only. We obtain an unified mathematic form of key elements of these new formulas. With two adjacent formulas combining side-channel atomicity, we form 4 identical units sharing same mathematic structure. The new algorithm is introduced based on the former units which are atomic naturally with low computational overburden, only one dummy squaring operation. Then, the old atomic block is optimized in the same way as well. We get better performance by saving two storages, seven operations, one dummy operation at least (4 at most) and one pre-computation. We merge two different atomic blocks into one with respect to different bit of quaternary scalar. This can also be implanted into the identical unit. As a result, mathematical structure of composite formulas gets the atomicity naturally. Our proposed algorithm is 12% faster than quaternary Montgomery ladder algorithm. Key words: ECC, Scalar Multiplication , SPA-resistant, side channel attack Introduction Koblitz and Miller [1,2] introduced Elliptic Curve Cryptography (ECC) independently in 1985. ECC can get high level of security with small key size. This merit is popular with low-resource devices such as smart cards. Scalar multiplication is the core operation of ECC and many researchers are dedicated to enhancing it in both efficiency and security. In terms of efficiency, many mainstream methods have been proposed to make scalar multiplication faster. The first one is to decrease the number of elliptic curve basic operations such as windows – based methods and comb – based using signed representation of the scalar and table look-up [3,4]. The second method is to optimize basic curve operations themselves. For example, new coordinates, LD Projective coordinate [5], Jacobian projective coordinate, and composite formulas, such as 2𝑃1 + 𝑃2 and 3𝑃1[6,7]. Other methods are to combine the former two methods to get better performance, such as [8,9] using double-based chain and representing the scalar as Fibonacci numbers respectively to get efficient elliptic curve scalar multiplication. Adversary can break a careless implementation of ECC easily though side channel information [10]. This attack is called side Channel Attack. After Side channel attack occurring, almost all secure method proposed is to protect scalar multiplication against it, especially, simple power analysis (SPA). SPA does works though distinguishing the different pattern of point addition and point doubling from power consumptions or computing timings [11,12]. A typical method against SPA is to make the scalar multiplication a fixed pattern. For example, Montgomery ladder [13] and multiplication always methods [14]. Multiplication always methods needs dummy operations, which is slower than Montgomery ladder method. Another typical method is to make the points operations indistinguishable, for example the indistinguishable operations [15] and atomicity method [16]. They make the point addition and point doubling same patterns and numbers of fields operations, so the adversary can’t tell the difference between different point operations though SPA. 2013, A fast and SPA-resistant comprehensive methods combining Montgomery trick [18], new form of scalar, composite formulas, and atomicity based on x-coordinate only Montgomery ladder algorithm has been proposed in [17]. The main idea is to represent scalar as quaternary form to reduce loops, computing new composite formulas 4𝑃1 , 2𝑃1 + 2𝑃2 and 3𝑃1 + 𝑃2. Based on these new formulas, the paper constructs atomic blocks (4𝑃1, 3𝑃1 + 𝑃2 ) and (3𝑃1 + 𝑃2, 2𝑃1 + 2𝑃2 ) with two dummy fields operations at least. This algorithm is at least 26% faster than previous algorithms. Can we get better performance with same principle? We represent scalar as octal form and optimize composite formulas 4𝑃1 and2𝑃1 + 2𝑃2 . Also, we propose new composite formulas based on new optimized formulas, such as 8𝑃1 7𝑃1 + 𝑃2 6𝑃1 + 2𝑃2, 5𝑃1 + 3𝑃2 and 4𝑃1 + 4𝑃2. The proposed composite formulas can utilize the x-coordinate-only in affine coordinates system. We apply side-channel atomicity to our composite operations only one dummy filed operations (squaring) at most. The proposed new algorithm is SPA-resistant and 12% faster than [17]. Two storages, 7 fields operations, at least one dummy operation are saved in atomic block by using our new algorithm. Our algorithm also doesn’t need to compute 2P in advance compare with [17]. The remainder of the paper is organized as follows. Section 2 presents a brief summary of elliptic curves over binary fields and the SPA-resistant extended quaternary Montgomery ladder algorithm. Section 3 we optimize two basic composite formulas and propose our new composite formulas. Then we describe our atomic block, the identical unit, with the structure of new composite formulas naturedly inserting only one dummy operation at most. Using optimized basic formulas we improve the previous atomic block proposed in [17]. In Section 4, we analyze the computational cost comparing with previous proposed algorithm. Section 5 is conclusion. 2 Preliminary 2.1 Elliptic curve cryptosystem and Scalar multiplication A non-supersingular elliptic curve E over GF(2𝑚 ) is defined by Weierstrass equation [1] 𝒚𝟐 + 𝐱𝐲 = 𝒙𝟑 + 𝒂𝒙𝟐 + 𝒃 (1) 𝑚 Where a and b ∈ GF(2 ) , b ≠ 0, together with the point at infinity defined by 𝒪. So, all the points including infinity point 𝒪 form a commutative finite group. Point addition and point doubling are basic points operations on the group. Given 𝑃 = (𝑥1 , 𝑦1 ) point doubling formula given by 𝑏 𝑥3 = 𝑥12 + 𝑥 2 1 { 𝑦 2 𝑦3 = 𝑥1 + (𝑥1 + 𝑥1 ) ∙ 𝑥2 + 𝑥2 (2) 1 The computational cost of the formula is 1I+2M+1S, where I ,M and S are field inversion, multiplication and squaring respectively. Give 𝑃1 = (𝑥1 , 𝑦1 ) and 𝑃2 = (𝑥2 , 𝑦2 ), point addition formula as follows 𝑥4 = (𝑥1 + 𝑦1 2 ) 𝑥1 + (𝑥1 + 𝑦1 )+𝑎 𝑥1 { 𝑦 𝑦4 = 𝑥12 + (𝑥1 + 𝑥1 ) ∙ 𝑥2 + 𝑥2 (3) 1 Point addition formula cost same as point doubling, 1I+2M+1S. A classic way to compute the scalar multiplication is the left-to-right binary method using binary representation of scalar, described in algorithm 1. Algorithm1 (The Original Montgomery Ladder Algorithm) Input 𝐝 = 𝒅𝒏−𝟏 𝟐𝒏−𝟏 + 𝒅𝒏−𝟐 𝟐𝒏−𝟐 + ⋯ + 𝒅𝟎 where 𝒅𝒏−𝟏 = 𝟏 Output dP 𝑸[𝟎] = 𝑷 For i=n-2 down to 0 do 𝑸[𝟎] = 𝟐𝑸[𝟎] if 𝒅𝒊 == 𝟏 𝑸[𝟎] = 𝑸[𝟎] + 𝑷 endif End for Return 𝑸[𝟎] Though simple power analysis, adversary can distinguish different point operations from algorithm 1 to extract scalar. In security view, Montgomery ladder algorithm [13] has been proposed to protect scalar multiplication against SPA. This method shows the same pattern regardless of the key bit. 2.2 SPA-resistant quaternary Montgomery Ladder Algorithm Because the slow performance of Montgomery ladder algorithm, Lopze and Dahab [5] proposed the x-coordinate-only Montgomery ladder Algorithm in 1999. Then several researchers introduced several similar algorithm [13,21]. This method utilize x coordinate only to realize point operation. Given 𝑃1 = (𝑥1 , 𝑦1 ) , 𝑃2 = (𝑥2 , 𝑦2 ) and 𝑃 = (𝑥0 , 𝑦0 ) be points on curve E. if there is equation 𝑃2 = 𝑃1 + 𝑃, then we get x-coordinate-only point addition and point doubling as follows 𝑏 𝑥2𝑃1 = 𝑥12 + 𝑥 2 1 { 𝑥𝑃1 +𝑃2 = 𝑥0 + 𝑥1 𝑥1 +𝑥2 +( 𝑥1 𝑥1 +𝑥2 2 (4) ) The Montgomery ladder algorithm with x-coordinate-only method is described in algorithm 2. Algorithm 2 (The SPA-resistant extended quaternary Montgomery Ladder Algorithm) Input 𝐝 = 𝒅𝒏−𝟏 𝟐𝒏−𝟏 + 𝒅𝒏−𝟐 𝟐𝒏−𝟐 + ⋯ + 𝒅𝟎 where 𝒅𝒏−𝟏 = 𝟏 Output dP 𝑸[𝟎] = 𝑷 𝑸[𝟏] = 𝟐𝑷 For i=n-2 down to 0 do 𝑸[𝟏 − 𝒅𝒊 ] = 𝑸[𝟏 − 𝒅𝒊 ] + 𝑸[𝒅𝒊 ] 𝑸[𝒅𝒊 ] = 𝟐𝑸[𝒅𝒊 ] End for Return 𝑸[𝟎] In order to get better performance, A fast SPA-resistant quaternary Montgomery ladder algorithm based on the x-coordinate-only Montgomery ladder algorithm has been proposed in 2013[17]. This paper shorten the number of loops though quaternary form of scalar. Different representation of scalar, new formulas must be calculated. Suppose we know 𝑃2 𝑃1, 𝑃 and 2𝑃, where 𝑃2 = 𝑃1 + 𝑃. Then three composite formulas with x-coordinate-only are given as follows 4𝑃1 = 𝑥14 + 𝑏 2 (𝑥14 +𝑏)2 +𝑏𝑥18 (5) 𝑥14 (𝑥14 +𝑏)2 2𝑃1 + 2𝑃2 = 𝑥2𝑃 + 𝜆 + 𝜆2 where 𝜆 = 𝑥12 (𝑥14 +𝑏) 2 (𝑥1 𝑥22 +𝑏)(𝑥1 +𝑥2 )2 (6) . The formula cost 1I+4M+5S. 3𝑃1 + 𝑃2 = 𝑥0 + 𝜆 + 𝜆2 where 𝜆 = (𝑥14 +𝑏)(𝑥1 +𝑥2 )2 2 𝑥1 [𝑥0 (𝑥1 +𝑥2 )2 +𝑥1 𝑥2 ]+(𝑥14 +𝑏)(𝑥1 +𝑥2 )2 . (7) The formula cost 1I+5M+4S. We can get two atomic blocks (4𝑃1 , 3𝑃1 + 𝑃2 ) and (3𝑃1 + 𝑃2 , 2𝑃1 + 2𝑃2 ) by combining two composited formulas into an atomic block. These atomic blocks reduce the duplicated operations in composited formulas. And Montgomery trick is employed in the atomic block to reduce number of inversion. Both methods can reduce the computational cost. Then the atomic block shows complete same operation sequence that we can’t distinguish which block is executed. Montgomery trick is an effective way to diminish one inversion with 3M. Give a and b , if we want to get 𝑎−1 and 𝑏−1 , the Montgomery trick calculates 𝑎𝑏−1 first, then get 𝑎−1 = 𝑏(𝑎𝑏)−1 and 𝑏−1 = 𝑎(𝑎𝑏)−1 with two multiplication. The SPA-resistant quaternary Montgomery algorithm and Montgomery trick depicted in Algorithm 3 Algorithm 3 (The SPA-resistant extended quaternary Montgomery Algorithm) Input 𝐝 = 𝒅𝒏−𝟏 𝟒𝒏−𝟏 + 𝒅𝒏−𝟐 𝟒𝒏−𝟐 + ⋯ + 𝒅𝟎 Output dP 𝑸[𝟎] = 𝒅𝒏−𝟏 𝑷 𝑸[𝟏] = (𝒅𝒏−𝟏 + 𝟏)𝑷 For i=n-2 down to 0 do ̅̅̅̅ 𝑯 (𝑸[𝟎] , 𝑸[𝟏] ) = 𝑨𝑬𝑪𝑻𝑫𝒙 (𝑸[𝒅𝑯 𝒊 ], 𝑸[𝒅𝒊 ]) End for Return 𝑸[𝟎] 𝑯 Deonte 𝒅𝑯𝒊 as the high bit of the 𝒅𝑖 , and ̅̅̅̅ 𝒅𝐻 𝒊 as the inversion of 𝒅𝒊 . These values can control the position of parameters considering 𝒅𝑖 =3 or 4. 𝑨𝑬𝑪𝑻𝑫𝒙 is executed in the atomic block. The block uses four dummy operations at most, 9 storages and 36 cycles per loop. According to computational analysis, this algorithm is faster than SPA-resistant scalar multiplication previous proposed. Moreover, even faster than many unprotected algorithm except Multi-base method introduced in [19]. So, algorithm 3 is effective. Our work is inspired by algorithm 3 to get better performance. 3 Fast SPA resistance algorithm In this section, we proposed a new scalar multiplication algorithm with better performance than [17]. We represent scalar in octal form, and use new composite formulas to make our algorithm better. 3.1 Extended Montgomery ladder algorithm We make the scalar shorter with the octal representation of scalar. According to the quaternary method, we can summarize five distinct formulas in mathematics. These five formulas can be denoted as follows. 8𝑄𝑖 [0] + 0𝑄𝑖 [1] and 0𝑄𝑖 [0] + 8𝑄𝑖 [1] 7𝑄𝑖 [0] + 1𝑄𝑖 [1] and 1𝑄𝑖 [0] + 7𝑄𝑖 [1] 6𝑄𝑖 [0] + 2𝑄𝑖 [1] and 2𝑄𝑖 [0] + 6𝑄𝑖 [1] 5𝑄𝑖 [0] + 3𝑄𝑖 [1] and 3𝑄𝑖 [0] + 5𝑄𝑖 [1] 4𝑄𝑖 [0] + 4𝑄𝑖 [1] and 4𝑄𝑖 [0] + 4𝑄𝑖 [1] share the same formula ECEAZP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥 share the same formula ECSAOP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥 share the same formula ECSATP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥 share the same formula ECFATP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥 share the same formula ECFAFP(𝑄𝑖 [𝑘], 𝑄𝑖 [𝑘̅])𝑥 Let d be a positive integer with the octal representation as d = 𝑑𝑛−1 8𝑛−1 + 𝑑𝑛−2 8𝑛−2 + ⋯ + 𝑑0 where 𝑑𝑖 = 0,1,2,3,4,5,6,7 (8) 𝑖 𝑖−𝑘 [0] ∑ [1] [0] [1], We define 𝑄𝑖 = 𝑘=1 𝑑𝑛−𝑘 8 and 𝑄𝑖 = 𝑄𝑖 + 𝑃 . Then, 𝑄𝑖+1 𝑄𝑖+1 [0] are computed with 𝑄𝑖 [1], 𝑄𝑖 [0] depending on 𝑑𝑛−𝑖−1 ,as follows (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECEAZP, 𝐸CSAOP(𝑄𝑖 [0], 𝑄𝑖 [1]) 𝑖𝑓 𝑑𝑖 = 0 (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = 𝐸CSAOP, ECSATP(𝑄𝑖 [0], 𝑄𝑖 [1]) 𝑖𝑓 𝑑𝑖 = 1 (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECSATP, ECFATP(𝑄𝑖 [0], 𝑄𝑖 [1]) 𝑖𝑓 𝑑𝑖 = 2 (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECFATP, ECFAFP(𝑄𝑖 [0], 𝑄𝑖 [1]) 𝑖𝑓 𝑑𝑖 = 3 (9) (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECFATP, ECFAFP(𝑄𝑖 [1], 𝑄𝑖 [0]) 𝑖𝑓 𝑑𝑖 = 4 (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECFATP, ECFAFP(𝑄𝑖 [1], 𝑄𝑖 [0]) 𝑖𝑓 𝑑𝑖 = 5 (𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = 𝐸CSAOP, ECSATP(𝑄𝑖 [1], 𝑄𝑖 [0]) 𝑖𝑓 𝑑𝑖 = 6 𝑖𝑓 𝑑𝑖 = 7 {(𝑄𝑖+1 [0], 𝑄𝑖+1 [1]) = ECEAZP, 𝐸CSAOP(𝑄𝑖 [1], 𝑄𝑖 [0]) We can get four identical units, and they are (𝐄𝐂𝐄𝐀𝐙𝐏, 𝑬𝐂𝐒𝐀𝐎𝐏) , (𝐄𝐂𝐄𝐀𝐙𝐏, 𝑬𝐂𝐒𝐀𝐎𝐏) , (𝐄𝐂𝐒𝐀𝐎𝐏, 𝑬𝐂𝐅𝐀𝐓𝐏) and (𝑬𝐂𝐅𝐀𝐓𝐏, 𝑬𝐂𝐅𝐀𝐅𝐏) Extended octal algorithm shows as follows Algorithm 4 (The extended octal Montgomery algorithm) Input 𝐝 = 𝒅𝒏−𝟏 𝟖𝒏−𝟏 + 𝒅𝒏−𝟐 𝟖𝒏−𝟐 + ⋯ + 𝒅𝟎 Output dP 𝑸[𝟎] = 𝒅𝒏−𝟏 𝑷 𝑸[𝟏] = (𝒅𝒏−𝟏 + 𝟏)𝑷 For i=n-2 down to 0 do If 𝒅𝒏−𝟏 == 𝟎 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟎], 𝑸[𝟏]) Else if 𝒅𝒏−𝟏 == 𝟏 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟎], 𝑸[𝟏]) else if 𝒅𝒏−𝟏 == 𝟐 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝐄𝐂𝐒𝐀𝐎𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟎], 𝑸[𝟏]) else if 𝒅𝒏−𝟏 == 𝟑 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟎], 𝑸[𝟏]), 𝑸[𝟏] = 𝑬𝐂𝐅𝐀𝐅𝐏(𝑸[𝟎], 𝑸[𝟏]) else if 𝒅𝒏−𝟏 == 𝟒 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝑬𝐂𝐅𝐀𝐅𝐏(𝑸[𝟏], 𝑸[𝟐]), 𝑸[𝟏] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟏], 𝑸[𝟎]) else if 𝒅𝒏−𝟏 == 𝟓 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝑬𝐂𝐅𝐀𝐓𝐏(𝑸[𝟏], 𝑸[𝟎]), 𝑸[𝟏] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟏], 𝑸[𝟎]) else if 𝒅𝒏−𝟏 == 𝟔 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟏], 𝑸[𝟐]), 𝑸[𝟏] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟏], 𝑸[𝟎]) else if 𝒅𝒏−𝟏 == 𝟕 𝒕𝒉𝒆𝒏 𝑸[𝟐] = 𝑬𝐂𝐒𝐀𝐎𝐏(𝑸[𝟏], 𝑸[𝟎]), 𝑸[𝟏] = 𝐄𝐂𝐄𝐀𝐙𝐏(𝑸[𝟏], 𝑸[𝟎]) End if End for Return 𝑸[𝟎] This algorithm is insecure to SPA and beyond our expectation for performance. We have to recalculate the composited formulas to decrease duplicated operations, and combining Montgomery trick to get better performance to get identical units. We use these formulas to get elliptic curve scalar multiplication atomically. And atomic method can bring us computational reduction, especially the filed reversion and duplicated operations. 3.2 New composite formulas using x-coordinates-only This section we improve two composite formulas 2𝑃1 + 2𝑃2 and4𝑃1. to construct the other five composite formulas ECEAZP, ECSAOP, ECSATP, ECFATP, and ECFAFP. We optimize two formulas, 4P and 2𝑃1 + 2𝑃2 in corollary 1 and 2, and then show five new composite formulas. Corollary 1 Let E be an binary elliptic curve defined over GF(2m) and P1 ( x1 , y1 ) be a point on E. Then, the following formula holds x4 P1 ( x14 b)4 bx18 The formula costs 1I+3M+5S. x14 ( x14 b)2 (10) Proof. Since we have already get x4P1 from section 2, then we can get another form from it, as follows: x4 P1 x14 b2 ( x14 b)2 bx18 x18 ( x14 b)2 b2 ( x14 b)2 bx18 x14 ( x14 b)2 x14 ( x14 b)2 ( x18 b2 )( x14 b)2 bx18 ( x14 b)4 bx18 4 4 x14 ( x14 b)2 x1 ( x1 b)2 (11) As a result, new form obtained and the cost is 1I+3M+5S. We can denote 𝑥4𝑃1 as another form 𝑥4𝑃1 = 𝐾⁄𝑀 where K ( x1 b) bx1 4 4 8 M x14 ( x1 4 b) 2 (12) Corollary 2 Let E be an binary elliptic curve defined over GF(2m), P1 ( x1 , y1 ) , P2 ( x2 , y2 ) , P ( x0 , y0 ) and be points on E, when P2 P1 P the following formula holds x2( P1 P2 ) [ x0 ( x1 x2 )2 x1 x2 ]4 b( x1 x2 )8 with cost 1I+5M+4S ( x1 x2 )4 [ x0 ( x1 x2 )2 x1 x2 ]2 Proof. Since x2 P1 x1 2 b b 2 then we have x2( P1 P2 ) x P P 2 2 1 2 x1 x P P 1 2 2 Since we have already obtained xP1 P2 can get (13) x2 x0 ( x1 x2 ) 2 x1 x2 x2 x0 . we x1 x2 x1 x2 ( x1 x2 )2 4 x0 ( x1 x2 ) 2 x1 x2 2 b 2 ( x x ) x0 ( x1 x2 ) 2 x1 x2 b 1 2 2 2 2 2 2 ( x x ) x0 ( x1 x2 ) x1 x2 1 2 x0 ( x1 x2 ) x1 x2 ( x1 x2 ) 2 ( x1 x2 ) 2 (14) So we obtain new formula as following x2( P1 P2 ) [ x0 ( x1 x2 )2 x1 x2 ]4 b( x1 x2 )8 ∎ ( x1 x2 )4 [ x0 ( x1 x2 )2 x1 x2 ]2 (15) This formula costs 1I+5M+4S. We describe new formula x2( P1 P2 ) as a simple form x2 P1 2 P2 R / Q where (16) Q ( x1 x2 ) 4 [ x0 ( x1 x2 ) 2 x1 x2 ]2 (17) R [ x0 ( x1 x2 ) 2 x1 x2 ]4 b( x1 x2 )8 (18) Theorem 1 Let E be an binary elliptic curve defined over GF(2m) and P1 ( x1 , y1 ) be a point on E. Then, the following formula holds ECEAZP(𝑃1, 𝑃2 )𝑥 = x8 P1 [( x14 b)4 bx18 ]4 b[ x14 ( x14 b) 2 ]4 [ x14 ( x14 b)2 ]2 [( x14 b)4 bx18 ]2 with cost 1I+5M+8S (19) Proof. Compute x8 P1 2(4 P1 ) , then we can compute it with following formula: x8 P1 x 42P 1 b x 42P (20) 1 Since we have obtain [inference 1] x4 P1 ( x14 b)4 bx18 x14 ( x14 b)2 Then use doubling formula, we can gain the new formula x8 P1 [( x14 b)4 bx18 ]4 b[ x14 ( x14 b) 2 ]4 ∎ [ x14 ( x14 b)2 ]2 [( x14 b)4 bx18 ]2 We describe this formula in a simple form: ECEAZP(𝑃1, 𝑃2 )𝑥 K 4 bM 4 M 2K 2 (21) Theorem 2 Let E be an binary elliptic curve defined over GF(2m), P1 ( x1 , y1 ) , P2 ( x2 , y2 ) , P ( x0 , y0 ) and be points on E, when P2 P1 P the following formula holds ECSAOP(𝑃1 , 𝑃2 )𝑥 = x7 P1 P2 x0 2 where KT 2 . KT 2 x14 ( x14 b) 2 ( x0T 2 LT L2 ) With cost 1I+11M+9S Proof. We can use 4 P1 (3P1 P2 ) to get this new formula. According to addition formula, we can get x4 P1 x7 P1 P2 x0 𝜆3𝑃1 +𝑃2 = 𝑥 2 [𝑥 1 we can get x4 P1 x3 P1 P2 x4 P1 x4 P x3 P P 1 1 2 2 (𝑥14 +𝑏)(𝑥1 +𝑥2 )2 4 2 2 0 (𝑥1 +𝑥2 ) +𝑥1 𝑥2 ]+(𝑥1 +𝑏)(𝑥1 +𝑥2 ) , 3𝑃1 + 𝑃2 = 𝑥0 + 𝜆3𝑃1 +𝑃2 + 𝜆23𝑃1+𝑃2 where 𝐿 =𝑇 x4 P1 x4 P1 x3P1 P2 KT 2 ∎ KT 2 x14 ( x14 b)2 ( x0T 2 LT L2 ) We describe this formula in a simple form: KT 2 KT 2 ECSAOP(𝑃1 , 𝑃2 )𝑥 x0 2 2 KT MN KT MN 2 (22) where N x0T LT L 2 2 Theorem 3 Let E be an binary elliptic curve defined over GF(2m), P1 ( x1 , y1 ) , P2 ( x2 , y2 ) , P ( x0 , y0 ) and be points on E, when P2 P1 P the following formula holds ECSATP(𝑃1 , 𝑃2 )𝑥 = x6 P1 2 P2 [ x0T 2 LT L2 ]4 bT 8 With cost 1I+9M+7S. T 4 [ x0T 2 LT L2 ]2 Proof. Since x3 P1 P2 x0T 2 LT L2 T2 Then x6 P1 2 P2 x2(3 P1 P2 ) x ( 3 P P ) b 2 1 2 2 x( 3 P P ) 1 . With doubling formula, we have 2 4 x0T 2 LT L2 2 b T2 x0T 2 LT L2 b 2 2 2 2 2 2 T x0T LT L2 x0T LT L T2 T2 (23) [ x0T 2 LT L2 ]4 bT 8 ∎ T 4 [ x0T 2 LT L2 ]2 As a result, this formula cost 1I++9M+7S. Then, we describe this formula in a simple form: ECSADP(𝑃1 , 𝑃2 )𝑥 N 4 bT 8 2 2 where N x0T LT L T 4N 2 (24) Theorem 4 Let E be an binary elliptic curve defined over GF(2m), P1 ( x1 , y1 ) , P2 ( x2 , y2 ) , P ( x0 , y0 ) and be points on E, when P2 P1 P the following formula holds ECFATP(𝑃1, 𝑃2 )𝑥 = x5 P1 3P2 x0 2 where T 2Q . The T 2Q R[ x0T 2 LT L2 ] formula cost 1I+11M+8S. Proof. Divide 5P1 3P2 into two parts: (2 P1 2 P2 ) and (3P1 P2 ) , difference of two parts is P ( x0 , y0 ) then we have 5P1 3P2 (2 P1 2 P2 ) (3P1 P2 ) x0 x2 P1 2 P2 x2 P1 2 P2 x3 P1 P2 x2 P1 2 P2 x2 P 2 P x3 P P 1 2 1 2 2 (25) Though doubling addition in section 2 we can get x2( P1 P2 ) [ x0 ( x1 x2 )2 x1 x2 ]4 b( x1 x2 )8 ( x1 x2 )4 [ x0 ( x1 x2 )2 x1 x2 ]2 And this formula can be described as Then compute as following R T 2R Q ∎ R x0T 2 LT L2 T 2 R Q[ x0T 2 LT L2 ] Q T2 x2 P1 2 P2 x2 P1 2 P1 x3 P1 P2 (26) We describe this formula in a simple form: T 2R T 2R 2 ECFATP(𝑃1, 𝑃2 )𝑥 x0 2 T R QN T R QN 2 (27) where N x0T LT L 2 2 Theorem 5 Let E be an binary elliptic curve defined over GF(2m), P1 ( x1 , y1 ) , P2 ( x2 , y2 ) , P ( x0 , y0 ) and be points on E, when P2 P1 P the following formula holds x4 P1 4 P2 Q 4 bR 4 4 2 2 where Q ( x1 x2 ) [ x0 ( x1 x2 ) x1 x2 ] and 2 2 Q R R [ x0 ( x1 x2 ) 2 x1 x2 ]4 b( x1 x2 )8 . The formula cost 1I+7M+8S Proof. We can get 4 P1 4 P2 by doubling 2 P1 2 P2 . Then, we can compute this formula by using doubling formula. Deonte x2( P1 P2 ) R , then Q 4 x2(2 P1 2 P2 ) R 2 Q b R b R 4 bQ 4 . ∎ 2 2 R 2Q 2 R Q R Q Q (28) 3.3 Fast SPA resistant Algorithm Aiming to construct scalar multiplication, each iteration of the algorithm is atomic that the process of different sequence is indistinguishable. We call the atomic sequence identical unit. Representing these new formulas into a simpler form, we find some interesting thing. Then the five key elements in new formulas can be seen as follows K 4 bM 4 KT 2 N 4 b(T 2 )4 x , x , x , 8 P1 7 P 1P 6 P 2 P M 2 K 2 1 2 KT 2 MN 1 2 (T 2 )2 N 2 (29) T 2R R 4 bQ 4 x5 P1 3 P2 , 2 x4 P1 4 P2 , T R QN R 2Q 2 (30) Obviously, there are two structures, add structure and doubling structure, among these key elements. We form two formulas into one identical unit and these identical units are listed in Equs.[9]. Combining atomicity, we make each unit perform same operation sequence. According to simple form of formulas, the computational sequences of unit are shown in Fig 1. T and N M,K R,Q 8P1 7P1P2 6P1 2P2 5P1 3P2 7P1P2 6P1 2P2 5P1 3P2 4P1 4P2 Fig 1. Computational process We make comparison between M, K and R, Q. We find that RQ cost one more squaring than M,K in computational consumption after the T and N, which is 2M+4S. Because we can use some intermediate values after T N. That means the two branches cost same after finishing MK with one dummy operation and RQ. And each processing unit get same structure as seen before, so total cost of each unit is same, 1I+17M+13S. This computational consumption can be divided into several parts. TN costs 6M+3S. KM costs 2M+3S, PQ costs 2M+4S based on TN. Both structures cost 4M+4S. Montgomery trick costs 3M. Obtaining final result cost 2M+2S. Worth to mention, each identical unit perfectly matches requiring only one dummy operation during computational processing. As a result, we get a total perfect new formulas in mathematic form based on optimized formulas. Moreover, we obtain identical unit with low additional computational overburden. We denote identical unit as 𝐹𝑆𝑃𝐴𝑅𝐴, fast SPA resistance algorithm over binary elliptic curves. 𝐻 ̅̅̅̅ Give 𝑑𝑖 , we denote higher bit and lower bit as 𝑑 𝐻 and 𝑑 𝑖 respectively. 𝐹𝑆𝑃𝐴𝑅 expressed as 𝑖 follows. 𝐻 ̅̅̅̅ (𝟖 − 𝒅𝒊 )𝑄[𝑑𝑖𝐻 ] + 𝒅𝒊 𝑄[𝑑 𝑖 ] 𝐻 ̅̅̅̅ 𝐹𝑆𝑃𝐴𝑅𝐴(𝑄[𝑑𝑖𝐻 ], 𝑄[𝑑 𝑖 ]) { 𝐻 ̅̅̅̅ (𝟕 − 𝒅𝒊 )𝑄[𝑑𝑖𝐻 ] + (𝒅𝒊 + 1)𝑄[𝑑 𝑖 ] (31) Proposed spa resistant and fast algorithm described in Algorithm 5. Algorithm 5 SPA-resistant and fast scalar multiplication Input 𝐝 = 𝒅𝒏−𝟏 𝟖𝒏−𝟏 + 𝒅𝒏−𝟐 𝟖𝒏−𝟐 + ⋯ + 𝒅𝟎 Output dP 𝑸[𝟎] = 𝒅𝒏−𝟏 𝑷 𝑸[𝟏] = (𝒅𝒏−𝟏 + 𝟏)𝑷 For i=n-2 down to 0 do ̅̅̅̅ 𝑯 (𝑸[𝟎] , 𝑸[𝟏] ) = 𝑭𝑺𝑷𝑨𝑹(𝑸[𝒅𝑯 𝒊 ], 𝑸[𝒅𝒊 ]) End for Return 𝑸[𝟎] The security of algorithm 5 originates from the identical unit. Table1 describe the identical unit of ECEAZP, 𝐸CSAOP and 𝐸CSAOP, ECSATP . Table2 describe the identical unit of ECSATP, ECFATP and ECFATP, ECFAFP. Table 1 Identical unit procedure for elliptic curves over GF(2m) Input: 𝐓𝟏 = 𝑷𝟏 = 𝒙𝟏 𝐓𝟐 = 𝑷𝟐 = 𝒙𝟐 𝐓𝟑 = 𝑷 = 𝒙𝟎 Output: (𝐓𝟏 𝐓𝟐 ) ← (𝟖𝑷𝟏 , 𝟕𝑷𝟏 + 𝑷𝟐 ) or (𝐓𝟏 𝐓𝟐 ) ← (𝟕𝑷𝟏 + 𝑷𝟐 , 𝟔𝑷𝟏 + 𝟐𝑷𝟐 ) T4 = 𝑇1 + 𝑇2 (𝑥1 + 𝑥2 ) 𝐓𝟒 = 𝑻𝟏 + 𝑻𝟐 (𝒙𝟏 + 𝒙𝟐 ) 𝐓𝟒 = 𝑻𝟐𝟒 T4 = 𝑇42 ((𝑥1 + 𝑥2 )2) 𝟐 ((𝒙𝟏 + 𝒙𝟐 ) ) 𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐) T5 = T3 ∙ T4 (𝑥0 (𝑥1 + 𝑥2 )2 ) 𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟐 (𝒙𝟏 𝒙𝟐) T6 = T1 ∙ T2 (𝑥1 𝑥2) 𝐓𝟓 = 𝐓𝟓 + 𝐓𝟔 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 ) + 𝒙𝟏 𝒙𝟐 ) T5 = T5 + T6 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2) 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟐𝟏 ) T1 = 𝑇12 (𝑥12) 𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟓 (𝒙𝟐𝟏 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )) T2 = T1 ∙ T5 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )) 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟒𝟏 ) T1 = 𝑇12 (𝑥14) 𝐓𝟕 = 𝐛 T7 = b 𝐓𝟔 = 𝐓𝟏 + 𝐓𝟕 (𝒙𝟒𝟏 + 𝒃) T6 = T1 + T7 (𝑥14 + 𝑏) 𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳: (𝒙𝟒𝟏 + 𝒃)(𝒙𝟏 + 𝒙𝟐 )𝟐) T4 = T4 ∙ T6 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 ) 𝟐 𝟐 𝐓𝟓 = 𝐓𝟐 ∙ 𝐓𝟒 (𝐋𝐓 + 𝑳 ) T5 = T2 ∙ T4 (LT + 𝐿2 ) 𝐓𝟒 = 𝐓𝟐 + 𝐓𝟒 (𝑻) T4 = T2 + T4 (𝑇) 𝐓𝟒 = 𝑻𝟐𝟒 (𝑻𝟐) T4 = 𝑇42 (𝑇 2) 𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 𝑻𝟐) T5 = T3 ∙ T4 (𝑥0 𝑇 2) 𝟐 𝐓𝟓 = 𝑻𝟓 + 𝑻𝟐 (𝑵: 𝒙𝟎 𝑻 + 𝑳 + 𝑳𝑻) T5 = 𝑇5 + 𝑇2 (𝑁: 𝑥0 𝑇 2 + 𝐿2 + 𝐿𝑇) 𝐓𝟔 = 𝑻𝟐𝟔 ((𝒙𝟒𝟏 + 𝒃)𝟐) T6 = 𝑇62 ((𝑥14 + 𝑏)2 ) 𝐓𝟐 = 𝑻𝟐𝟐 (𝐃𝐮𝐦𝐦𝐲 ) T2 = 𝑇22 (Dummy) 𝐓𝟐 = 𝑻𝟏 ∙ 𝑻𝟔 (𝐌: 𝒙𝟒𝟏 (𝒙𝟒𝟏 + 𝒃)𝟐 ) T2 = 𝑇1 ∙ 𝑇6 (M: 𝑥14 (𝑥14 + 𝑏)2 ) 𝐓𝟓 = 𝐓𝟐 ∙ 𝐓𝟓 (𝐌𝐍) T2 = T2 ∙ T5 (MN) 𝐓𝟏 = 𝑻𝟐𝟏 𝟐 (𝒙𝟖𝟏 ) T1 = 𝑇12 (𝑥18) 𝐓𝟔 = 𝑻𝟐𝟔 ((𝒙𝟒𝟏 + 𝒃)𝟒) T6 = 𝑇62 ((𝑥14 + 𝑏)4 ) 𝐓𝟏 = 𝑻𝟏 ∙ 𝑻𝟑 (𝒃𝒙𝟖𝟏) T1 = 𝑇1 ∙ 𝑇3 (𝑏𝑥18) 𝐓𝟔 = 𝐓𝟏 + 𝐓𝟔 (𝐊: (𝒙𝟒𝟏 + 𝒃)𝟒 + 𝒃𝒙𝟖𝟏 ) T6 = T1 + T6 (K: (𝑥14 + 𝑏)4 + 𝑏𝑥18) 𝐓𝟒 = 𝐓𝟔 ∙ 𝐓𝟒 (𝑲𝑻𝟐) T6 = T6 ∙ T4 (𝐾𝑇 2 ) 𝟐 𝐓𝟓 = 𝐓𝟓 + 𝐓𝟒 (𝑨: 𝑲𝑻 + 𝑴𝑵) T2 = T6 + T4 (𝐴: 𝐾𝑇 2 + 𝑀𝑁) 𝐓𝟐 = 𝑻𝟐𝟐 (𝑴𝟐) T4 = 𝑇42 (𝑇 4) 𝐓𝟔 = 𝑻𝟐𝟔 (𝑲𝟐) T5 = 𝑇52 (𝑁 2) 𝐓𝟏 = 𝐓𝟐 ∙ 𝐓𝟔 (𝐁: 𝑴𝟐 𝑲𝟐) T1 = T4 ∙ T5 (B: 𝑁 2 𝑇 4) 𝐓𝟔 = 𝑻𝟐𝟔 (𝑲𝟒) T5 = 𝑇52 (𝑁 4) 𝐓𝟐 = 𝑻𝟐𝟐 (𝑴𝟒) T4 = 𝑇42 (𝑇 8) 𝐓𝟕 = 𝑻𝟕 ∙ 𝐓𝟐 (𝒃𝑴𝟒) T7 = 𝑇7 ∙ T4 (𝑏𝑇 8) 𝐓𝟕 = 𝐓𝟕 + 𝐓𝟔 (𝒃𝑴𝟒 + 𝑲𝟒) T7 = T7 + T4 (𝑏𝑇 8 + 𝑁 4) 𝐓𝟐 = 𝑻𝟏 ∙ 𝐓𝟓 (𝐀𝐁) T4 = 𝑇1 ∙ T2 (AB) 𝐓𝟐 = 𝑻−𝟏 ((𝐀𝐁)−𝟏) 𝟐 T4 = 𝑇4−1 ((AB)−1) −𝟏 𝐓𝟔 = 𝐓𝟐 ∙ 𝐓𝟓 (𝐁 ) T5 = T2 ∙ T4 (B −1) 𝐓𝟓 = 𝐓𝟐 ∙ 𝐓𝟏 (𝐀−𝟏) T2 = T4 ∙ T1 (A−1) 𝐓𝟏 = 𝐓𝟕 ∙ 𝐓𝟔 (𝟖𝑷𝟏) T2 = T5 ∙ T7 (6𝑃1 + 2𝑃2 ) 𝐓𝟓 = 𝐓𝟓 ∙ 𝐓𝟒 (𝝀 ) T6 = T6 ∙ T2 (𝜆 ) 𝐓𝟐 = 𝑻𝟑 + 𝐓𝟓 (𝒙𝟎 + 𝝀 ) T1 = 𝑇3 + T6 (𝑥0 + 𝜆 ) 𝐓𝟓 = 𝑻𝟐𝟓 T6 = 𝑇62 (𝜆2) 𝟐 (𝝀 ) 𝐓𝟐 = 𝑻𝟐 + 𝐓𝟓 (𝟕𝑷𝟏 + 𝑷𝟐) T1 = 𝑇1 + T6 (7𝑃1 + 𝑃2 ) Table 2 Identical unit procedure of elliptic curves over GF(2m) Input: 𝐓𝟏 = 𝑷𝟏 = 𝒙𝟏 𝐓𝟐 = 𝑷𝟐 = 𝒙𝟐 𝐓𝟑 = 𝑷 = 𝒙𝟎 Output: (𝐓𝟏 𝐓𝟐 ) ← (𝟔𝑷𝟏 + 𝟐𝑷𝟐 , 𝟓𝑷𝟏 + 𝟑𝑷𝟐 ) or (𝐓𝟏 𝐓𝟐 ) ← (𝟓𝑷𝟏 + 𝟑𝑷𝟐 , 𝟒𝑷𝟏 + 𝟒𝑷𝟐 ) 𝐓𝟒 = 𝑻𝟏 + 𝑻𝟐 (𝒙𝟏 + 𝒙𝟐 ) 𝐓𝟒 = 𝑻𝟐𝟒 ((𝒙𝟏 + 𝒙𝟐 T4 = 𝑇1 + 𝑇2 (𝑥1 + 𝑥2 ) T4 = 𝑇42 ((𝑥1 + 𝑥2 )2 ) )𝟐 ) 𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 ) T5 = T3 ∙ T4 (𝑥0 (𝑥1 + 𝑥2 )2 ) 𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟐 (𝒙𝟏 𝒙𝟐 ) T6 = T1 ∙ T2 (𝑥1 𝑥2 ) 𝐓𝟓 = 𝐓𝟓 + 𝐓𝟔 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 ) T5 = T5 + T6 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ) 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟐𝟏 ) T1 = 𝑇12 (𝑥12 ) 𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟓 (𝒙𝟐𝟏 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )) T2 = T1 ∙ T5 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )) 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟒𝟏 ) T1 = 𝑇12 (𝑥14 ) 𝐓𝟕 = 𝐛 𝐓𝟔 = 𝐓𝟏 + T7 = b 𝐓𝟕 (𝒙𝟒𝟏 + 𝒃) 𝐓𝟔 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳: (𝒙𝟒𝟏 + 𝒃)(𝒙𝟏 + 𝒙𝟐 )𝟐 ) T6 = T1 + T7 (𝑥14 + 𝑏) T6 = T4 ∙ T6 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 ) 𝐓𝟏 = 𝐓𝟐 ∙ 𝐓𝟒 (𝐋𝐓 + 𝑳𝟐 ) T1 = T2 ∙ T4 (LT + 𝐿2 ) 𝐓𝟔 = 𝐓𝟐 + 𝐓𝟒 (𝑻) T6 = T2 + T4 (𝑇) 𝐓𝟔 = 𝑻𝟐𝟔 T6 = 𝑇62 (𝑇 2 ) (𝑻𝟐 ) 𝐓𝟐 = 𝐓𝟑 ∙ 𝐓𝟔 (𝒙𝟎 𝑻𝟐 ) T2 = T3 ∙ T1 (𝑥0 𝑇 2 ) 𝐓𝟏 = 𝑻𝟏 + 𝑻𝟐 (𝑵: 𝒙𝟎 𝑻𝟐 + 𝑳𝟐 + 𝑳𝑻) T1 = 𝑇1 + 𝑇2 (𝑁: 𝑥0 𝑇 2 + 𝐿2 + 𝐿𝑇) 𝐓𝟒 = 𝑻𝟐𝟒 ((𝒙𝟏 + 𝒙𝟐 )𝟒 ) T4 = 𝑇42 ((𝑥1 + 𝑥2 )4 ) 𝐓𝟓 = 𝑻𝟐𝟓 ((𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )𝟐 ) T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )2 ) 𝐓𝟐 = 𝑻𝟒 ∙ 𝑻𝟓 (Q) T2 = 𝑇4 ∙ 𝑇5 (Q) 𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟒 (𝐐𝐍) 𝐓𝟒 = 𝑻𝟐𝟒 ((𝒙𝟏 + 𝒙𝟐 T1 = T1 ∙ T2 (QN) T4 = 𝑇42 ((𝑥1 + 𝑥2 )8 ) )𝟖 ) 𝐓𝟓 = 𝑻𝟐𝟓 ((𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )𝟒 ) T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )4 ) 𝐓𝟒 = 𝑻𝟑 ∙ 𝑻𝟒 (𝐛(𝒙𝟏 + 𝒙𝟐 )𝟖 ) T4 = 𝑇3 ∙ 𝑇4 (b(𝑥1 + 𝑥2 )8 ) 𝐓𝟒 = 𝐓𝟓 + 𝐓𝟐 (𝐑) T4 = T5 + T2 (R) 𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 T6 = T4 ∙ T6 (𝑅𝑇 2 ) (𝑹𝑻𝟐 ) 𝐓𝟐 = 𝐓𝟐 + 𝐓𝟔 (𝐀: 𝑹𝑻𝟐 + 𝑸𝑵) T1 = T1 + T6 (A: 𝑅𝑇 2 + 𝑄𝑁) 𝐓𝟔 = 𝑻𝟐𝟔 (𝑻𝟒 ) T4 = 𝑇42 (𝑅2 ) 𝐓𝟏 = 𝑻𝟐𝟏 (𝑵𝟐 ) T2 = 𝑇22 (𝑄 2 ) 𝐓𝟓 = 𝐓𝟏 ∙ 𝐓𝟔 (𝐁: 𝑻𝟒 𝑵𝟐 ) T5 = T2 ∙ T4 (B: 𝑇 4 𝑁 2 ) 𝐓𝟏 = 𝑻𝟐𝟏 (𝑵𝟒 ) T4 = 𝑇42 (𝑅4 ) 𝐓𝟔 = 𝑻𝟐𝟔 (𝑻𝟖 ) T2 = 𝑇22 (𝑄 4 ) 𝐓𝟕 = 𝑻𝟕 ∙ 𝐓𝟔 (𝒃𝑻𝟖 ) T7 = 𝑇7 ∙ T2 (𝑏𝑅4 ) 𝐓𝟕 = 𝐓𝟕 + 𝐓𝟏 (𝒃𝑻𝟖 + 𝑵𝟒 ) T7 = T7 + T1 (𝑏𝑅4 + 𝑄 4 ) 𝐓𝟔 = 𝑻𝟓 ∙ 𝐓𝟐 (𝐀𝐁) T4 = 𝑇1 ∙ T5 (AB) 𝐓𝟔 = 𝑻−𝟏 𝟔 T4 = 𝑇4−1 ((AB)−1 ) ((𝐀𝐁)−𝟏 ) 𝐓𝟏 = 𝐓𝟐 ∙ 𝐓𝟔 (𝐁−𝟏 ) T2 = T4 ∙ T1 (B−1 ) 𝐓𝟐 = 𝐓𝟓 ∙ 𝐓𝟔 (𝐀−𝟏 ) T1 = T4 ∙ T5 (A−1 ) 𝐓𝟏 = 𝐓𝟕 ∙ 𝐓𝟏 (𝟔𝑷𝟏 + 𝟐𝑷𝟐 ) T2 = T2 ∙ T7 (4𝑃1 + 4𝑃2) 𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟐 (𝝀 ) T6 = T6 ∙ T1 (𝜆 ) 𝐓𝟐 = 𝑻𝟑 + 𝐓𝟗 (𝒙𝟎 + 𝝀 ) T1 = 𝑇3 + T6 (𝑥0 + 𝜆 ) 𝐓𝟒 = 𝑻𝟐𝟒 T6 = 𝑇62 (𝜆2 ) (𝝀𝟐 ) 𝐓𝟐 = 𝑻𝟐 + 𝐓𝟒 (𝟓𝑷𝟏 + 𝟑𝑷𝟐) T1 = T1 + T6 (5𝑃1 + 3𝑃2 ) With new composite formulas, we optimize original atomic block into a new one described in table 3. We use seven storages, only one dummy operation, and 29 operation sequences. And also, we don’t need pre-computation of 2P. Table 3 Optimized atomic block Input: 𝐓𝟏 = 𝑷𝟏 = 𝒙𝟏 𝐓𝟐 = 𝑷𝟐 = 𝒙𝟐 𝐓𝟑 = 𝑷 = 𝒙𝟎 Output: (𝐓𝟏 𝐓𝟐 ) ← (𝟒𝑷𝟏 , 𝟑𝑷𝟏 + 𝟐𝑷𝟐 ) or (𝐓𝟏 𝐓𝟐 ) ← (𝟑𝑷𝟏 + 𝟐𝑷𝟐 , 𝟐𝑷𝟏 + 𝟐𝑷𝟐 ) 𝐓𝟒 = 𝑻𝟏 + 𝑻𝟐 (𝒙𝟏 + 𝒙𝟐 ) T4 = 𝑇1 + 𝑇2 (𝑥1 + 𝑥2 ) 𝐓𝟒 = 𝑻𝟐𝟒 ((𝒙𝟏 + 𝒙𝟐 )𝟐 ) T4 = 𝑇42 ((𝑥1 + 𝑥2 )2 ) 𝐓𝟓 = 𝐓𝟑 ∙ 𝐓𝟒 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 ) T5 = T3 ∙ T4 (𝑥0 (𝑥1 + 𝑥2 )2 ) 𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟐 (𝒙𝟏 𝒙𝟐 ) T6 = T1 ∙ T2 (𝑥1 𝑥2 ) 𝐓𝟓 = 𝐓𝟓 + 𝐓𝟔 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟐𝟏 ) )𝟐 + 𝒙𝟏 𝒙𝟐 ) T5 = T5 + T6 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ) T1 = 𝑇12 (𝑥12 ) 𝐓𝟐 = 𝐓𝟏 ∙ 𝐓𝟓 (𝒙𝟐𝟏 (𝒙𝟎 (𝒙𝟏 + 𝒙𝟐 )𝟐 + 𝒙𝟏 𝒙𝟐 )) T2 = T1 ∙ T5 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )) 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟒𝟏 ) T1 = 𝑇12 (𝑥14 ) 𝐓𝟕 = 𝐛 T7 = b 𝐓𝟔 = 𝐓𝟏 + 𝐓𝟕 (𝒙𝟒𝟏 T6 = T1 + T7 (𝑥14 + 𝑏) + 𝒃) 𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳: (𝒙𝟒𝟏 + 𝒃)(𝒙𝟏 + 𝒙𝟐 )𝟐 ) T6 = T4 ∙ T6 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 ) 𝐓𝟐 = 𝐓𝟐 + 𝐓𝟒 (𝐓) T2 = T2 + T4 (T) 𝐓𝟔 = 𝑻𝟐𝟔 ((𝒙𝟒𝟏 T4 = 𝑇42 ((𝑥1 + 𝑥2 )4 ) + 𝒃)𝟐 ) 𝐓𝟓 = 𝑻𝟐𝟓 (𝐝𝐮𝐦𝐦𝐲) T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))2 𝐓𝟓 = 𝑻𝟏 ∙ 𝑻𝟔 (𝐌: 𝒙𝟒𝟏 (𝒙𝟒𝟏 + 𝒃)𝟐 ) T1 = 𝑇5 ∙ 𝑇4 (Q) 𝐓𝟏 = 𝑻𝟐𝟏 (𝒙𝟖𝟏 ) T4 = 𝑇42 ((𝑥1 + 𝑥2 )8 ) 𝐓𝟏 = 𝑻𝟏 ∙ 𝑻𝟑 (𝒃𝒙𝟖𝟏 ) T1 = 𝑇1 ∙ T4 (b(𝑥1 + 𝑥2 )8 ) 𝐓𝟔 = 𝐓𝟔𝟐 ((𝒙𝟒𝟏 + 𝒃)𝟒) T5 = 𝑇52 ((𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ))4 𝐓𝟕 = 𝐓𝟏 + 𝐓𝟔 (K) T7 = T1 + T5 (R) 𝐓𝟏 = 𝐓𝟓 ∙ 𝐓𝟐 (MT) T5 = T1 ∙ T2 (QT) 𝐓𝟏 = 𝑻−𝟏 𝟏 (𝑴𝑻 −𝟏 T5 = 𝑇5−1 ((𝑄𝑇)−1 ) ) 𝐓𝟔 = 𝐓𝟏 ∙ 𝐓𝟓 (𝑻−𝟏 ) T4 = T1 ∙ T5 (𝑇 −1 ) 𝐓𝟓 = 𝐓𝟏 ∙ 𝐓𝟐 (𝑴−𝟏 ) T1 = T2 ∙ T5 (𝑄 −1 ) 𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟔 (𝑳𝑻−𝟏 ) T4 = T4 ∙ T6 (𝐿𝑇 −1 ) 𝐓𝟏 = 𝐓𝟕 ∙ 𝐓𝟓 (𝑲𝑴−𝟏 𝟒𝑷𝟏 ) T2 = T1 ∙ T5 (𝐾𝑀−1 2𝑃1 + 2𝑃2) 𝐓𝟐 = 𝐓𝟒 + 𝐓𝟑 (𝒙𝟎 + 𝝀) T4 = T4 + T3 (𝑥0 + 𝜆) 𝐓𝟒 = 𝑻𝟐𝟒 T4 = 𝑇42 (𝜆2 ) (𝝀𝟐 ) 𝐓𝟐 = 𝐓𝟒 + 𝐓𝟐 (𝟑𝑷𝟏 + 𝑷𝟐 ) T1 = T4 + T2 (3𝑃1 + 𝑃2 ) New formulas2𝑃1 + 2𝑃2 and 4𝑃1make the atomic blocks same mathematic structure, which have similar computational process with Fig1. Different branches utilize different address of storage. Therefore, we can find a way to merge two original atomic blocks into one with respect to scalar bits. We refer to the new one as unified atomic block which is described in Table4. This unified atomic block can be guide for both hardware and software implementations. Table 4 Unified atomic block with respect to scalar bits. Input: 𝐓𝟐 = 𝑷𝟏 = 𝒙𝟏 𝐓𝟒 = 𝑷𝟐 = 𝒙𝟐 𝐓𝟓 = 𝑷 = 𝒙𝟎 𝑳 m=𝒅𝑯 𝒊 ⨁𝒅𝒊 Output: (𝐓𝟐 𝐓𝟒 ) ← (𝟒𝑷𝟏 , 𝟑𝑷𝟏 + 𝟐𝑷𝟐 ) or (𝐓𝟐 𝐓𝟒 ) ← (𝟑𝑷𝟏 + 𝟐𝑷𝟐 , 𝟐𝑷𝟏 + 𝟐𝑷𝟐 ) 𝐓𝟏 = 𝑻𝟐 + 𝑻𝟒 𝐓𝟏 = 𝑻𝟐𝟏 (𝑥1 + 𝑥2 ) ((𝑥1 + 𝑥2 )2 ) 𝐓𝟑 = 𝐓𝟓 ∙ 𝐓𝟏 (𝑥0 (𝑥1 + 𝑥2 )2 ) 𝐓𝟒 = 𝐓𝟒 ∙ 𝐓𝟐 (𝑥1 𝑥2 ) 𝐓𝟑 = 𝐓𝟑 + 𝐓𝟒 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 ) 𝐓𝟐 = 𝑻𝟐𝟐 (𝑥12 ) 𝐓𝟒 = 𝐓𝟐 ∙ 𝐓𝟑 (𝑥12 (𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )) 𝐓𝟐 = 𝑻𝟐𝟐 (𝑥14 ) 𝐓𝟔 = 𝐛 T6 = b 𝐓𝟎 = 𝐓𝟐 + 𝐓𝟔 (𝑥14 + 𝑏) 𝐓𝒎 ̅ = 𝐓𝟎 ∙ 𝐓𝟏 (𝐿: (𝑥14 + 𝑏)(𝑥1 + 𝑥2 )2 ) 𝐓𝒎 ̅ +𝟐 = 𝐓𝒎 ̅ + 𝐓𝟒 (T) 𝐓𝒎 = 𝑻𝟐𝒎 ((𝑥14 + 𝑏)2 ) OR(𝑥1 + 𝑥2 )4 𝐓𝟒−𝒎 = 𝑻𝟐𝟒−𝒎 (dummy operation or(𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )2 ) 𝐓𝟒 = 𝑻𝒎 ∙ 𝑻𝒎+𝟐 (M or Q) 𝐓𝒎+𝟐 = 𝐓𝒎 = 𝑻𝟐𝒎+𝟐 (𝑥18 or(𝑥0 (𝑥1 + 𝑥2 )2 + 𝑥1 𝑥2 )4 𝟐 𝐓𝒎 ((𝑥14 + 𝑏)4 ) or (𝑥1 + 𝑥2 )8 𝐓𝒎 ̅ +𝟏 = 𝐓𝒎 ̅ +𝟏 ∙ 𝑻𝟔 (𝑏𝑥18 or b(𝑥1 + 𝑥2 )8 ) 𝐓𝒎+𝟐 = 𝐓𝒎 + 𝐓𝒎+𝟐 (K or R) 𝐓𝟔 = 𝐓𝟒 ∙ 𝐓𝒎 ̅ +𝟐 (MT or QT) 𝐓𝟔 = 𝑻−𝟏 𝟔 (MT)−1 or (QT)−1 𝐓𝒎 = 𝐓𝟒 ∙ 𝐓𝟔 (T)−1 𝐓𝟒 = 𝐓𝟑−𝒎 ̅ ∙ 𝐓𝟔 (M)−1 or (Q)−1 𝐓𝒎 ̅ = 𝐓𝒎 ̅ ∙ 𝐓𝒎 L(T)−1 𝐓(𝒅𝑳𝒊 ≪𝟏)+𝟐 = 𝐓𝟒 ∙ 𝐓𝟐+𝒎 2𝑃1 + 2𝑃2 or 4𝑃1 or 4𝑃2 𝐓𝟓 = 𝐓𝟓 + 𝐓𝒎 ̅ (𝑥0 + 𝜆) 𝐓𝒎 ̅ = (𝜆2 ) 𝑻𝟐𝒎 ̅ (3𝑃1 + 𝑃2 ) or 3𝑃2 + 𝑃1 𝐓(𝒅̅̅̅𝑳̅≪𝟏)+𝟐 = 𝐓𝒎 ̅ + 𝐓𝟓 𝒊 4 Computational analysis This section we make comparison between the proposed fast algorithm and previously presented algorithms with respect to computational cost. As we all know, inversion is the most costly operation in field arithmetic. Ref. [17] described the performance of field inversion that is equal to about 6.67 -10.33 multiplications in binary field when the field size is 163-bit. Generally, researchers assume that n=160 and I=8M, where n is bit length of scalar d. Table 5 shows the comparison of total cost of proposed algorithm and previously presented algorithms. We denote #I and #M as the numbers of inversion and multiplication in listed algorithms. According to assumption, total cost can be calculated with #I*8+#M. We compare our algorithm with both SPA-nonresistant and SPA-resistant. We conclude that the proposed algorithm 4 is the more effective. Moreover, algorithm 4 is faster than unprotected algorithm such as Multi-base 1 which is faster than algorithm 3 [21], but is 1.09 times slower than our algorithm. [17] mention that Multi-base 1 [21] can’t employ this method to get better performance because of huge cost for dummy operation. Among SPA-resistant algorithms, [17] is effective, but is 1.12 times slower than our algorithm. Table 5 Comparison of the total cost between proposed algorithm and others (I/M=8). SPA-nonresistant SPA-resistant Algorithm #I #M Total cost Ratio Algorithm #I #M Total cost Ratio Binary 240 480 2400 1. 54 [12] 318 636 3180 2.36 [𝟐𝟐]𝑵𝑨𝑭 213 426 2130 1.58 [19] 318 318 2862 2.12 [𝟐𝟐]𝟑−𝑵𝑨𝑭 200 400 2000 1.48 [16] 240 480 2400 1.78 [6] 129 787 1819 1.35 [10] 205 410 2050 1.52 [8] 114 789 1701 1.26 [4] 205 410 2050 1.52 [𝟐𝟏]𝑴𝑩𝟏 97 693 1469 1.09 [3] 203 406 2030 1.50 [𝟐𝟏]𝑴𝑩𝟐 113 677 1581 1.17 [17] 80 878 1518 1.12 Algorithm 5 54 918 1350 1 Table6 break even points between proposed algorithm and the other SPA-resistant algorithms Algorithm Break even point Algorithm Break even point [10] 1.06 [12] 3.12 [19] 2.27 [20] 3.36 [16] 2.75 [10] 3.36 [3] 3.43 [17] 1.53 Break even point is number of multiplication needed per one inversion defined in [11]. This value can reflect performance between different algorithm with formula ((#𝑀2 − #𝑀1 )/(#𝐼1 − #𝐼2 )). Denote #𝑀1 and #𝐼1 as the cost numbers of multiplication and inversion of algorithm A, and #𝑀1 and #𝐼1 as the cost numbers of multiplication and inversion of algorithm B. If actual I/M in real implementation is greater than the break even point, then algorithm A is faster than B. Table 6 shows the values of break even point illustrating that proposed algorithm is faster than the other algorithms under general the assumption that I/M=8. Our algorithm inherits merit from [17] and we improve the atomic block proposed in [17]. Algorithm 5 requires two point storages which is less than window-based methods and comb-based methods. And our algorithm doesn’t require additional storages for points. However, atomic block in [17] requires 9 storages and 35 cycles during the atomic process. Our optimized atomic block only requires 7 storages and 28 cycles. Moreover, we reduce 4 dummy operations in our improved atomic block only with only 1 dummy operation per iteration, as shown in Table 3. The total savage is 560 operations and 200 dummy operations in average. We have known that this method to get better performance doesn’t fit for some other unprotected algorithms, such as Multi-base. However, if we take hexadecimal scalar, is it worth our effort? Extended quaternary Montgomery algorithm is 26% faster than previous algorithms such as windows-based methods and comb-based methods. The proposed algorithm is 12%, about 170M savage, faster than extended quaternary Montgomery algorithm. We can get downward trend which shows that savage decreasing when use a new radix with same method. Extended quaternary algorithm cost 1518M in 80 iterations. There are 54 iterations in octal form, so we can save 26 iterations. Moreover, iteration can save 19M (1I+11M), so the total savage is 494M. If the additional computational cost is less than 9M (494/54), then the total cost of per iteration in octal form is less than 28 M, better performance obtained. Actually, our algorithm is 25M per iteration. Same thing to the hexadecimal form, if total cost is less than 32M per iteration we can also get better performance. However, considering the tradeoff between circuit area and efficiency, the hexadecimal form is not optimal. 5 Conclusion We propose a fast SPA-resistant scalar multiplication method based on extended elliptic curve Montgomery ladder algorithm over binary fields with resistance to SPA. We improve two composite formulas 2𝑃1 + 2𝑃2 and 4𝑃1 and compute new composite formulas 8𝑃1 7𝑃1 + 𝑃2 6𝑃1 + 2𝑃2, 5𝑃1 + 3𝑃2 and 4𝑃1 + 4𝑃2 to construct four identical units. These identical units share same mathematic structure which construct new algorithm. Algorithm5 saves at least 12% of running time compared to previous algorithms such as the fast algorithm 3. We optimize atomic block to save two storages, 4 dummy operations(at most) and 6 operations per loop. It requires 7 storages, only one dummy squaring and 28 operations per loop. We merge two atomic blocks in extended quaternary Montgomery ladder algorithm into one atomic block. This new one chooses storage automatically with different quaternary bits of scalar. Acknowledgements This work is supported by the National Program on Key Research Projects of China (No. 2013CB338004) This work is supported by the National Natural Science Foundation of China (No. 61202372, 61073150,61202371) References [1] N. Koblitz, Elliptic curve cryptosystems, Mathematics of Computation 48 (1987) 203–309. [2] V. Miller, Uses of elliptic curves in cryptography, Advances in Cryptography, CRYPTO’85, LNCS, vol. 218, Springer-Verlag, 1986. [3]B. Miller, Securing elliptic curve point multiplication against side-channel attacks, information security, in: G.I. Davida, Y. Frankel (Eds.), LNCS, vol.2200, Springer-Verlag, 2001, pp. 324–334. [4]K. Okeya, T. Takagi, The width-w NAF method provides small memory and fast elliptic scalar multiplication secure against side channel attacks, CT- RSA2003, LNCS, vol. 2612, Springer-Verlag, 2003. [5] López J, Dahab R. Improved algorithms for elliptic curve arithmetic in GF (2n)[C]//Selected areas in cryptography. Springer Berlin Heidelberg, 1999: 201-212. [6] M. Ciet, K. Lauter, M. Joye, P.L. Montgomery, Trading inversions for multiplications in elliptic curve cryptography, Designs, Codes and Cryptography 39 (2) (2006) 189–206. [7] K. Eisentrager, K. Lauter, P.L. Montgomery, Fast elliptic curve arithmetic and improved Weil pairing evaluation, in: M. Joye (Ed.), CT-RSA2003, LNCS,367 vol. 2612, Springer-Verlag, 2003, pp. 343–354. [8]Dimitrov V, Imbert L, Mishra P K. Efficient and secure elliptic curve point multiplication using double-base chains[M] Advances in Cryptology-ASIACRYPT 2005. Springer Berlin Heidelberg, 2005: 59-78. [9] Meloni N. New point addition formulae for ECC applications [M] Arithmetic of Finite Fields. Springer Berlin Heidelberg, 2007: 189-201. [10] Ghosh S, Kumar A, Das A, et al. On the implementation of unified arithmetic on binary huff curves [M] Cryptographic Hardware and Embedded Systems-CHES 2013. Springer Berlin Heidelberg, 2013: 349-364. [11]P. Kocher, Timing attacks on implementations of Diffie–Hellman, RSA, DSS, and others systems, CRYPTO’96, LNCS, vol. 1109, Springer-Verlag, 1996. [12] P. Kocher, Introduction to differential power analysis, Journal of Cryptographic Engineering 1 (1) (2011) 5– 27. [13] T. Izu, T. Takagi, A fast parallel elliptic curve multiplication resistant against side channel attacks, PKC2002, LNCS, vol. 2274, Springer-Verlag, 2002. [14] J. Coron, Resistance against differential power analysis for elliptic curve cryptosystems, CHES’99, LNCS, vol. 1717, Springer-Verlag, 1999. [15] E. Brier, M. Joye, Weierstrass elliptic curves and side-channel attacks, PKC2002, LNCS, vol. 2274, Springer-Verlag, 2002. [16] B. Chevalier-Mames, M. Ciet, M. Joye, Low-cost solutions for preventing simple side-channel analysis:Side-channel atomicity, IEEE Transactions on Computers 53 (6) (2004) 760–768. [17] Cho S M, Seo S C, Kim T H, et al. Extended Elliptic Curve Montgomery Ladder Algorithm over Binary Fields with Resistance to Simple Power Analysis[J]. Information Sciences, 2013. [18] H. Cohen, Acourse in Computational Algebraic Number Theory, GTM138, Springer-Verlag, New York, 1993. [19] J. Lopez, R. Dahab, Fast multiplication on elliptic curves over GF(2m) without precomputation, CHES’99, LNCS, vol. 1717, Springer-Verlag, 1999. [20] M. Feng, B.B. Zhu, M.Xu, Shipeng Li, Efficient comb elliptic curve multiplication methods resistant to power analysis <http://eprint.iacr.org/2005/222.ps.gz>, 2005. [21] P.K. Mishra, V. Dimitrov, Efficient quintuple formulas for elliptic curves and efficient scalar multiplication using multibase number representation, ISC 2007, LNCS, vol. 4779, Springer, Verlag, 2007. [22] J.A. Solinas, Efficient arithmetic on Koblitz curves, Designs, Codes and Cryptography 19 (2000) 195–249.