1295 1295 Vol 05, Article 10465, October 2014 International Journal of VLSI and Embedded Systems-IJVES http://ijves.com ISSN: 2249 – 6556 HIGH-SPEED SUPERIOR BI-ROTATIONAL CORDIC USING QUADRANT AMENDMENT WITH PRE-SCALED STRUCTURAL DESIGN K JAYARAM KUMAR1, P SOUNDARYA MALA2 1 PG student (VLSI & Embedded Systems), GIET, Rajahmundry, India 2 Associate Professor, ECE Dept, GIET, Rajahmundry, India 1 jramworld@gmail.com, 2palivela.soundarya@gmail.com ABSTRACT Rotation of vectors through fixed and known angles has many applications in animations, robotics, games, computer graphics and digital signal processing. In fixed angle rotation, the rotation of vectors are uncontrolled by the design till all elementary rotations are completed. In real time applications it is a major issue because of unpredictable angles and ineffective system gain. Therefore, in this paper, we propose a optimized superior Bi-rotational CORDIC design for high speed fixed angle vector-rotation through specific angles with angle correction and quadrant amendment. An efficient structural design is also developed with conventional circuit to reduce the circuit complexity. The angle correction and quadrant amendment are done by using rotation mapping and position sequencing algorithm respectively. Keywords: Bi-rotational CORDIC, vector rotation, position sequencer, CORDIC. I. INTRODUCTION Coordinate Rotation DIgital Computer is abbreviated as CORDIC. The key concept of CORDIC arithmetic is based on the simple and ancient principles of two-dimensional geometry. But the iterative formulation of a computational algorithm for its implementation was first described in 1959 by Jack E. Volder [1], [2] for the computation of trigonometric functions, multiplication and division. CORDIC-based computing received increased attention in 1971, when John Walther [3], [4] showed that, by varying a few simple parameters, it could be used as a single algorithm for unified implementation of a wide range of elementary transcendental functions involving logarithms, exponentials, and square roots along with those suggested by Volder [1]. During the same time, Cochran [5] benchmarked various algorithms, and showed that CORDIC technique is a better choice for scientific calculator applications. The popularity of CORDIC was very much enhanced thereafter primarily due to its potential for efficient and low cost implementation of a large class of applications which include: the generation of trigonometric, logarithmic and transcendental elementary functions; complex number multiplication, eigenvalue computation, matrix inversion, solution of linear systems and singular value decomposition (SVD) for signal processing, image processing, and general scientific computation. Some other popular and upcoming applications are: 1) direct frequency synthesis, digital modulation and coding for speech synthesis and communication. 2) direct and inverse kinematics computation for robot manipulation. 3) planar and three-dimensional vector rotation for graphics and animation. Rotation of vectors through a fixed and known angle has wide applications in robotics, graphics, games and animation [6], [13], [14]. Locomotion of robots is very often performed by successive rotations through small fixed angles and translations of the links. The translation operations are realized by simple additions of coordinate values while the new coordinates of a rotational step could be accomplished by suitable successive rotations through a small fixed angle which could be performed by a CORDIC circuit for fixed rotation [6]. Similarly, interpolation of orientations between key-frames in computer graphics and animation could be performed by fixed CORDIC rotations [14]. There are plenty of examples of uniform rotation starting from electrons inside an atom to the planets and satellites. A simple example of uniform rotations is the hands of an animated mechanical clock which perform one degree rotation each time. There are several cases where highspeed constant rotation are required in games, graphic and animation. The objects with constant rotations are very often used in simulation, modelling, games and animation. Efficient implementation of rotation through a known small angle to be used in these areas could be implemented efficiently by simple and dedicated CORDIC circuits. Keeping the requirements and constraints of different application environments in view, the development of CORDIC algorithm and architecture has taken place for achieving high throughput rate and reduction of hardware-complexity as well as the latency of implementation. Some of the typical approaches for reduced-complexity implementation are focussed on minimization of the complexity of scaling operation and the complexity of barrel-shifter in the CORDIC engine. Latency of implementation is an inherent drawback of the conventional CORDIC algorithm. Angle recoding schemes, mixed-grain rotation and higher radix CORDIC have been developed for reduced latency realization. Parallel and pipelined CORDIC have been suggested for high-throughput computation. 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 1296 1296 Vol 05, Article 10465, October 2014 International Journal of VLSI and Embedded Systems-IJVES http://ijves.com ISSN: 2249 – 6556 II. METHODOLOGY A CORDIC can be operated in two different modes, the vectoring and the rotation mode. In vectoring mode, coordinates (x,y) are rotated until y converges to zero. In rotation mode, initial vector (x,y) starts aligned with the x-axis and is rotated by an angle of θi every cycle, so sfter n iterations, θn is the obtained angle. All the trigonometric functions can be computed or derived from functions using vector rotations. The CORDIC algorithm provides an iterative method of performing vector rotations by arbitrary angles using only shift and add operations. The algorithm is derived using the general rotation transform.The CORDIC algorithm performs a planar rotation. Graphically, planar rotation means transforming a vector (x,y) into a new vector (x',y'). Vector V, came into image after anticlockwise rotation by an angle ϕ. From Fig.1 & 2, it can be observed that x = r cosθ, y = r sinθ (1) cos ϕ −sin ϕ x V′ = x′ = (2) y′ sin ϕ cos ϕ y Fig.1 Rotation of vector V by an angle ϕ Fig.2 Vector v with magnitude r and phase θ It is well known that the rotation matrix R(ϕ) = cos ϕ sin ϕ −sin ϕ cos ϕ (3) 1 Applying successive rotations to the initial vector modified rotation matrix is 0 1 −tan ϕ R ϕ = cos ϕ (4) tan ϕ 1 -i The multiplication by the tangent term can be avoided if tanϕ = 2 . In digital hardware, this denotes a simple shift operation and since cos(ϕ) = cos(-ϕ) is constant for a fixed number of iterations. −i R ϕ = cos ϕ 1−i −2 (5) 2 1 Equation (2) can be expressed as −i x x′ = cos ϕ 1−i −2 (6) y y′ 2 1 𝜙𝑖 may be positive or negative depending upon whether the rotation is anticlockwise or clockwise. Equation (6) can be expressed as 1 −di 2−i x x′ cos ϕ (7) ′ = y y di 2−i 1 III. ADVANCED CORDIC DESIGN FOR FIXED ANGLE VECTOR ROTATION In rotation-mode CORDIC algorithm the basic CORDIC equation can now be expressed[7] as xi+1 = ki (xi - yi di 2-i) (8) yi+1 = ki (yi + xi di 2-i) (9) Angle accumulator is zi+1= zi – di tan−1 2−i Or zi+1= zi – di ϕi (10) Where i denotes the number of rotations for the angle of required vector, ki = cos(arctan(2-i)) and di= ±1(direction of rotation ), the product of the 𝑘𝑖 represents K-factor n−1 K= ki (11) i=0 The architecture of the new CORDIC rotator can be derived by a suitable hardware mapping of the algorithm described above. For sake of clarity, the implementation of a 16-bit CORDIC rotator is described here as an example. All of the discussions presented in this section can be generalized for an n- bit CORDIC rotator as well. Where 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 1297 1297 Vol 05, Article 10465, October 2014 International Journal of VLSI and Embedded Systems-IJVES http://ijves.com ISSN: 2249 – 6556 𝑛−1 𝑘𝑖 = cos 𝜙0 . cos 𝜙1 . cos 𝜙2 … cos 𝜙𝑛−1 (12) 𝑖=0 (ϕ is the angle of rotation here for n times rotation). These ϕi are stored in the ROM of the CORDIC hardware as the look up table. 𝑘𝑖 is the CORDIC gain and it is same for all CORDIC hardware. For 8-bit hardware CORDIC approximation method., the value of 𝑘𝑖 as 7 k i = cos ϕ0 . cos ϕ1 . cos ϕ2 … cos ϕ7 i=0 = cos 45o . cos26.565o … cos 0.4469o = 0.6073 For 16 bit CORDIC hardware the value of 𝑘𝑖 is (13) k i = cos ϕ0 . cos ϕ1 … cos ϕ15 = 0.6073 (14) 15 i=0 For 24 bit CORDIC hardware, the value of 𝑘𝑖 is 23 k i = cos ϕ0 . cos ϕ1 … cos ϕ23 = 0.6073 (15) i=0 For 32 bit CORDIC hardware, the value of 𝑘𝑖 is 31 k i = cos ϕ0 . cos ϕ1 … cos ϕ31 = 0.6073 (16) i=0 In case of fixed rotation, ϕi could be pre-computed and the sign bits corresponding to d i could be stored in a signbit register (SBR) in CORDIC circuit. The CORDIC circuit therefore need not compute the remaining angle ϕi during the CORDIC iterations [9]. A reference CORDIC circuit for fixed rotations according to (8) and (9) is shown in Fig.1. x0 and y0 are fed as set/reset input to the pair of input registers and the successive feedback values xi and yi at the ith iteration are fed in parallel to the input registers. Note that conventionally we feed the pair of input registers with the initial values x0 and y0 as well as the feedback values xi and yi through a pair of multiplexers. Meanwhile, it is well known that the rotation through any angle, 0 < θ ≤ 2Π can be mapped into a positive rotation through 0 < ɸ ≤ Π/4 without any extra arithmetic operations. Hence, as a first step of optimization, we perform the rotation mapping so that the rotation angle lies in the range of 0 < ɸ ≤ Π/4. The scale-factor K now depends on the the set of elementary angles. The accuracy of CORDIC algorithm depends on how closely the resultant rotation ɸA due to all the micro-rotations in (8) & (9) approximates to the desired rotation angle ɸ, where angular deviation, Δɸ = ɸ - ɸA,which in turn determines the deviation of actual rotation vector from the estimated value. We show here that only a few elementary angles are sufficient to have a CORDIC rotation in the range [0, π/4], and different sets of elementary angles can be chosen according to the accuracy requirement. Fig.3 The reference CORDIC circuit for fixed rotation Fig.4 Hardwired-shifted bi-rotation CORDIC circuit. SBR is sign-bit register of 2-bits size. → k(0) indicates right-shift by k(0) bit-locations. 1.Implementation of Micro-rotations: Since the elementary angles and direction of micro-rotations are predetermined for the given angle of rotation, the angle estimation data-path is not required in the CORDIC circuit for fixed and known rotations. Moreover, because only a few elementary angles are involved in this case, the corresponding control-bits could be stored in a ROM of few words. In fig.4 The ROM contains the control- 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 1298 1298 Vol 05, Article 10465, October 2014 http://ijves.com International Journal of VLSI and Embedded Systems-IJVES ISSN: 2249 – 6556 bits for the number of shifts corresponding the micro-rotations to be implemented by the barrel-shifter and the directions of micro-rotations are stored in the sign-bit register (SBR). For 8,16,24,32 Bit CORDIC Hardware Table.1 Elementary angles 2.Minimization of Barrel-Shifter Complexity by Hardwired Pre-shifting: The hardware-complexity of barrelshifter increases linearly with the word-length and number of shifts. We can reduce the effective word-length in the MUXes of the barrel-shifters, and so also the number of stages of MUXes by simple hardwired pre-shifting as shown in Fig.4. If l is the minimum number of shifts in the set of selected micro-rotations, we can load only the (L−l) most-significant bits (MSBs) of an input word from the registers to the barrel-shifters, since the l less significant bits (LSBs) would get truncated during shifting. The barrel-shifter, therefore, needs to implement a maximum of (s−l) shifts only, where s is the maximum number of shifts in the set of selected micro-rotations. The output of the barrel- shifters are loaded as the (L−l) LSBs to the add/subtract units, and the l MSBs of the corresponding operand of add/subtract unit are hardwired to 0. Therefore, the hardware-complexity of a barrelshifter could be reduced by the hardwired pre- shifting approach. The time involved in a barrel-shifter could also be reduced by hardwired pre-shifting, since the delay of the barrel-shifter is proportional to the number of stages of MUXes, and it is also be possible to reduce the number stages by hardwired pre-shifting. 3.Bi-rotation CORDIC Cell: We find that using only two micro-rotations, it is possible to get an accuracy up to 0.033 radian. Although the accuracy achieved by two micro- rotations is inadequate in many situations, but can be used for some applications where the outputs are quantized, e.g., in case of speech and image compression etc. [11], [12]. Besides, the rotations with four and six micro-rotations can also be implemented successively by two and three pairs of micro-rotations, respectively. Therefore, we design an efficient CORDIC circuit to implement a pair of micro-rotations, and named as “bi-rotation CORDIC”. The circuit for bi-rotation CORDIC is shown in Fig.4. It consists of an adder-module, two 2:1 multiplexers and a sign-bit register (SBR) of two bit size. The adder-module consists of a pair of adders/subtractors. The adders/subtractors perform additions or subtractions according to the sign-bit available from the SBR. The components of the input vector (real and the imaginary parts of the input complex operand) are loaded to the input-registers through set/reset input. The output of the registers are sent in two lines where the content of the register is fed to one of the adder/subtractor directly while that in the other line is loaded to the barrel-shifter pre-shifted by k(0) bit-locations to right by hardwired pre-shifting technique. The output of the adders are loaded back to the input registers for the second CORDIC iteration. The bi-rotation CORDIC involves only a pair of barrel-shifters consisting of only one stage of 2:1 MUXes. The control-bit for the barrel-shifters is 0 for the first micro-rotation (no shift) and 1 for the second micro-rotation (shift through k(1)−k(0)). The control bits are generated by a T flip-flop, since they are 1 and 0 in each alternate cycle. 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 1299 1299 Vol 05, Article 10465, October 2014 International Journal of VLSI and Embedded Systems-IJVES http://ijves.com ISSN: 2249 – 6556 IV. HIGH-THROUGHPUT DESIGN USING SUPERIOR BI-ROTATIONAL CORDIC In proposed model, a two stage bi-rotational CORDIC with position sequencer Algorithm is used to increase the efficiency in fixed angle rotation A. position sequencer Algorithm: Fig.5 position sequencer signal For Reducing Errors in Quadrants the angle range should be -90° to +90° and all angles should be in the range of first and fourth Quadrant in rotational implementation. These blocks improves the CORDIC fixed rotation with more accuracy . For compound rotations larger than 90° an additional algorithm is required to make any iteration approach to first and fourth quadrant known as position sequencer . Now the upper two bits of arctan represents quadrant in which the angle lies 00 → I, 01 → II, 10 → III, 11 → IV. I, II, III & IV are quadrants. For 00 and 11 angle correcion is not required.it lies in I and IV quadrants For 01, the correction algorithm is x'= -di×y y' = di×x z' = z + di× Π /2 where di = +1 if y < 0, -1 otherwise ----------------------------------------------------------For 10, the correction algorithm is x'= di×y y' = -di×x z' = z - di× Π /2 where di = -1 if x < 0, +1 otherwise B. Superior Bi-rotational CORDIC For reduction of adder complexity over the cascaded single-rotation CORDIC, the micro-rotations could be implemented by a cascaded birotation CORDIC circuit. A two-stage cascaded superior bi-rotation CORDIC is shown in Fig. The first two of the microrotations out of the four-optimized microrotations could be implemented by stage-1, while the rest two are performed by stage-2. The structure and function of the bi-rotation CORDIC is shown in Fig.4. For implementing six selected micro-rotations, we can use a three-stage-cascade of bi-rotation CORDIC cells. The three-stage superior bi-rotation cells could however be extended further when higher accuracy is required. V. SCALING OPTIMIZATION AND IMPLEMENTATION Optimized set of angle rotations and micro-rotations are discussed in this section Fixed Rotation Angle scaling approach The normalized equation for the scale-factor given can be articulated explicitly for the set of selected set of m1 micro rotations as m1−1 1 + 2−2𝑘(𝑖) K= −1 2 (17) 0 where k(i) for 0 ≤ i ≤ m1is the no. of shifts in the ith micro-rotation. Except for k(i)=0 (i.e., rotation by 45), by binomial expansion, any term in (4) can be written as 𝑥 3𝑥 2 2 8 1- + Where x=2-2 − 5𝑥 3 16 + 35𝑥 4 125 − 3𝑥 5 256 + 231 𝑥 6 1024 … (18) 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 1300 1300 Vol 05, Article 10465, October 2014 International Journal of VLSI and Embedded Systems-IJVES http://ijves.com ISSN: 2249 – 6556 VI. PERFORMANCE ANALYSIS The performance simulations of superior Bi-rotational CORDIC are carried out with MODELSIM ALTERA 6.5e Simulator. Synthesis has been carried out with XILINK ISE. The test frequency is 132.258MHz, the power supply voltage is 1.1 V and delay of 7.038ns. The number of 4-bit shift register required is 1, registers and slice flip flops are 516, 4 input LUTs 577 and number of IOBS 67. The Bi-rotational CORDIC is successfully tested by simulation result and gain factor is limited with 1.64. Fig.7 Two-stage superior Bi-rotational CORDIC cell VII. RESULTS AND EXAMINATION Fig.8 Block Diagram Fig.9 Technology Schematic of CORDICalgorithm Fig.10 Simulation Results of CORDIC algorithm 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc., 1301 1301 Vol 05, Article 10465, October 2014 http://ijves.com International Journal of VLSI and Embedded Systems-IJVES ISSN: 2249 – 6556 VIII. CONCLUSION The Superior Bi-rotational CORDIC is attractive for the calculation of fixed angle elementary functions because of its accuracy and parallel processing.The proposed CORDIC architecture requires more area over the reference design, but offer high throughput. The simulation is done in MODELSIM and the code is functionally verified to be correct. The CORDIC features like computation of division, multiplication, square-root and evaluation of trigonometric functions has made it an eye-catching choice for a wide variety of applications. For high-throughput applications, efficient cascaded bi-rotational CORDIC with more than two stages could be developed to take the advantage of CORDIC, because the digital hardware is getting cheaper along with the progressive device-scaling. The area-delay-accuracy trade-off for different advanced algorithms may be investigated in detail and compared with in future work. REFERENCES [1] J. E. Volder, “The CORDIC trigonometric computing technique,” IRE Transactions on Electronic Computers, vol. EC-8, pp. 330–334, Sep.1959. [2] J. S. Walther, “A unified algorithm for elementary functions,” in Proceedings 38th Spring Joint Computer Conference, Atlantic City, New Jersey, 1971, pp. 379–385. [3] J. S. Walther, “A unified algorithm for elementary functions,” in Proc. 38th Spring Joint Computer Conf., Atlantic City, NJ, 1971, pp. 379–385. [4] J. S.Walther, “The story of unified CORDIC,” J. VLSI Signal Process., vol. 25, no. 2, pp. 107–112, June 2000. [5] D. S. Cochran, “Algorithms and accuracy in the HP-35,” Hewlett- Packard J., pp. 1–11, Jun. 1972. [6] P. K. Meher, J. Valls, T.-B. Juang, K. Sridharan, and K. Maharatna, “50 years of CORDIC: Algorithms, architectures and applications,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 56, no. 9, pp. 1893–1907, Sep. 2009. [7] C.-S. Wu, A.-Y. Wu, and C.-H. Lin, “A high performance/low-latency vector rotational CORDIC architecture based on extended elementary angle set and trellis-based searching schemes,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 50, no. 9, pp. 589–601, Sep. 2003. [8] T.-B. Juang, S.-F. Hsiao, and M.-Y. Tsai, “Para-CORDIC: parallel CORDIC rotation algorithm,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 8, pp. 1515–1524, Aug. 2004. [9] Y. H. Hu, “CORDIC-based VLSI architectures for digital signal processing,”IEEE Signal Processing Magazine, vol. 9, no. 3, pp. 16–35,Jul. 1992. [10] K. Maharatna, S. Banerjee, E. Grass, M. Krstic, and A. Troya, “Modified virtually scaling free adaptive CORDIC rotator algorithm and architecture,” IEEE Transactions Circuits and Syst. for Video Technology, vol. 15, no. 11, pp. 1463–1474, Nov. 2005. [11] A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. [12] N. Jayant, Ed., Signal Compression : Coding of Speech, Audio, Text, Image And Video, ser. Selected Topics in Electronics and Systems. NJ: World Scientific, 1997, vol. 9. [13] S. Suchitra, S. Sukthankar, T. Srikanthan, and C. T. Clarke, “Elimination of sign precomputation in flat CORDIC,” in IEEE Int. Symp. On Circuits Syst., ISCAS’05, May 2005, vol. 4, pp. 3319–3322. [14] E. Deprettere, P. Dewilde, and R. Udo, “Pipelined CORDIC architectures for fast VLSI filtering and array processing,” in IEEE Int. Conf. on Acoust., Speech, Signal Process., ICASSP’84,Mar. 1984, vol. 9, pp. 250–253. 2010-2014 – IJVES Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,