View Research Plan

advertisement
PhD Research plan
Proposed thesis title
“Implementation of the Computational Arithmetic
Operations for Signal Processing Applications”
Proposal Submitted for the Degree of Doctor of Philosophy
by
Deepak Kumar
Under the Supervision of
Dr. Anup Dandapat
Department of Electronics and Communication Engineering
National Institute of Technology Meghalaya
Department Computer Science and Engineering
National Institute of Technology Meghalaya
Shillong-793003, Meghalaya
India
December, 2014
Abstract
Computational mathematics like multiplication, division, square, square root, reciprocal and other basic
mathematical operations play a pivotal role in the field of digital signal processing, image processing,
computer graphics, application-specific (embedded) system, cryptography etc. Now a days the
implementation procedure for the above mentioned operations follow the software routines targeted to
FPGA (Field programmable gate array). Thereby, the implementation procedures are not satisfying the
present demand (high speed, low power consumption and chip area) of the people. With advances in
the VLSI technology hardware implementation has become an attractive alternative. Assigning complex
computation tasks to hardware and exploiting the parallelism and pipelining in algorithms yield
significant speedup in running times. Moreover, computers keep getting faster; there are always new
applications that need more processing speed from earlier. Examples of current high-demand
applications include real-time video stream encoding and decoding, real-time biometric (face, retina,
and/or fingerprint) recognition, military aerial and satellite surveillance applications. To meet the present
and future demanded applications, modern techniques (algorithms) for accelerating applications on
commercial hardware need to be developed.
Contents
1. Introduction
2. Literature review
2.1 Works related to multiplier
2.2 Works related to floating point arithmetic
2.3 Works related to divider
2.4 Works related to reciprocal
2.5 Works related to square root
3. Scope of work
4. PhD Plan
4.1 Work done so far
4.2 Future work
4.3 Achievements and goals
5. Conclusions
6. References
7. Publications of the author related to the proposed work
1. Introduction
Rapid advancements in electronics, particularly fabrication of the integrated circuits for commercial
applications have a major impact in both industry and society. With remarkable progress in the field of
very large scale integration (VLSI) circuit technology, many complex circuits are easily realizable.
Algorithms that seem impossible to implement now have attractive implementation possibilities for the
future. Thereby, amalgam-nation of conventional and unconventional computer arithmetic methods will
give the trends for the investigation in new designs in near future. The trend is going towards exciting
interaction between current research in theoretical computer science and practical applications in the
area of VLSI design. This has mainly been initiated by the following two facts, i.e, I. Chip design without
the assistance of computers is no longer conceivable. II. The construction of powerful chips owing
towards the absolute frontiers of current possibilities and capabilities. An ASIC (Application Specific
Integrated Circuit) provides very efficient solutions for well defined logic for a complex mathematical
function. Optimization of digital system means to have a good balance between the physical structure of
circuits and the informational structure of programs running on them. Because, complex systems belong
to the programmable systems, the hardware support offered by circuits may be oriented towards
programmable structures, whose functionality is actualized by the embedded information (program).
Circuit implementation is evaluated on the basis of the following objectives: latency (propagation delay),
power consumption, cycle-time and area as well as throughput (i.e. computation rate) for pipelined
circuits. In addition, the circuit structure may be constrained pre and post specified operations with
input/output (I/O) ports. In general, propagation delay and power consumption depend on the resources
as well as the steering logic.
2. Literature review
Digital arithmetic (DA) encompasses the study of number representations, operations on numbers,
hardware implementation of arithmetic unit and application towards general-purpose and applicationspecific systems. An arithmetic unit (processor) is a system that performs operations on numbers. The
most common cases are considered in which these numbers are:
I. Fixed point numbers (Integers, Rational numbers)
II. Floating-point numbers
The floating-point numbers approximate real numbers and facilitate computations over a wide dynamic
range.
An arithmetic processor operates on one, or more operands depend on the applications. The operands
are characterized and represented as set of values. The operation is selected from an allowable set,
which usually includes addition, subtraction, multiplication, division, square root, change of sign,
comparison, and so on. The results can be DA numbers, logical variables (conditions), and/or singularity
conditions (exceptions). The domain of digital arithmetic systems is considered as part of computing
science. This, possible view point presents the digital systems as systems which compute their
associated transfer functions. But, from a functional view point it is simply a computational system,
because future technologies will impose maybe different physical ways of implementation (for example,
different kinds of ASIC System). Therefore, we decided to start our approach using a functionally
oriented introduction in digital arithmetic circuits like multiplication and division, considered as a subdomain of computing science. Technology dependent knowledge has been described only as a
supporting background for various design options. The initial and abstract level of computation has been
described by the algorithmic level. Algorithms specify the steps to be executed in order to perform a
computation. The most actual level consists in two realms:
I. The huge and complex domain of the application software and
II. The very tangible domain of the real machines implemented in a certain technology.
An intermediate level provides the means to be used for allowing an algorithm to be embodied in a
physical structure of a machine or in an informational structure of a program. It is about- I. The domain
of the formal programming languages, and II. The domain of hardware architecture. Both of them are
described using specific and rigorous formal tools. The hardware embodiment of computations is done
in digital systems. Pseudo-code language has been described to express algorithms. The ‘main user’ of
this kind of language is only the human mind.
A list of survey for present computational techniques for arithmetic operations is presented here:
2.1 Works related to multiplier
The common multiplication is done using shift and add operations [1], where the sequential mechanism
is used and produces large propagation delay. In parallel multipliers, the partial products are generated
through the Booth’s encoding [2] techniques and the partial products are added with the help of parallel
adders, therefore the generation and addition stages limit overall speed of the parallel multiplier . To
reduce the number of partial products, modified Booth’s algorithm [3] is one of the most popular
mechanisms while to achieve speed improvements Wallace Tree algorithm [4] that reduce the number
of sequential addition stages can be incorporated. Another solution for partial product addition has been
reported by Wang in 1995, where the compressors [5] are used for the partial products addition stages,
which reduces the carry propagation significantly.
Vedic Mathematics is the ancient system of Indian mathematics which has a unique technique of
calculations based on 16 Sutras (Formulae). “Urdhva-tiryakbyham” ( a Sanskrit word means ‘vertically
and crosswise’) formula, is used for smaller number multiplication. Few research papers [6] have so far
been published using “Urdhva-tiryakbyham” formula aiming for fast multiplication. However, Mehta et.
al. [7] explored multiplication using “Urdhva-tiryakbyham” sutra indicates the carry propagation issues.
Likewise, a multiplier design using “all from 9 and last from 10” formula (“Nikhilam Navatascaramam
Dasatah” sutra) has been reported by Tiwari et al [8] in 2009, but without any hardware module
implementation in the circuit level.
2.2 Works related to floating point arithmetic
The IEEE Standard 754-1985 for Binary Floating-Point Arithmetic [9] was revised, and an important
addition is the definition of decimal floating-point arithmetic. This is intended mainly to provide a robust,
reliable framework for financial applications that are often subject to legal requirements concerning
rounding and precision of the results, because the binary floating-point arithmetic may introduce small
but unacceptable errors.
There is increased interest in decimal floating-point arithmetic in both industry and academia as the
IEEE 754R . Draft approaches the stage where it may soon become the new standard for floating-point
arithmetic. The P754 Draft describes two different possibilities for encoding decimal floating-point
values: the binary encoding, based on using a Binary Integer to represent the significand (BID, or
Binary Integer Decimal), and the decimal encoding, which uses the Densely Packed Decimal (DPD) [10]
method to represent groups of up to three decimal digits from the significand as 10-bit declets. An
inherent problem of binary floating-point arithmetic used in financial calculations is that most decimal
floating-point numbers cannot be represented exactly in binary floating-point formats, and errors that are
not acceptable may occur in the course of the computation. Decimal floating-point arithmetic addresses
this problem but a degradation in performance will occur compared to binary floating-point operations
implemented in hardware. Despite its performance disadvantage, decimal floating-point is required by
certain applications which need results identical to those calculated by hand. This is true for currency
conversion, banking, billing, and other financial applications. Sometimes these requirements are
mandated by law, other times they are necessary to avoid large accounting discrepancies. Because of
the importance of this problem a number of decimal solutions exist, both hardware and software.
Software solutions include C#], COBOL, and XML, which provide decimal operations and datatypes.
Also, Java and C/C++ both have packages, called BigDecimal and decNumber, respectively. Hardware
solutions were more prominent earlier in the computer age with the ENIAC] and UNIVAC. However,
more recent examples include the CADAC, IBM’s z900 [11] and z9 architectures, and numerous other
proposed hardware implementations [12] [13] [14]. More hardware examples can be found in [15], and a
more in-depth discussion is found in Wang’s Processor Support for Decimal Floating-Point Arithmetic
[16], Estimations have been made that hardware approaches to decimal floating-point will have average
speedups of 100-1000 times over software [15]. Hardware implementations would undoubtedly yield a
significant speedup but not as dramatic, and that will make a difference only if applications spend a
large percentage of their time in decimal floating-point computations.
2.3 Works related to divider
Many recursive techniques have so far been proposed by various researchers to implement the high
speed divider [17-27], such as digit recurrence implementation methodology (restoring [23-25, 27], nonrestoring [20-22, 27]), division by convergence method (Newton- Raphson method [27]), division by
series expansion (Goldschmidt algorithm[26]). The cost in terms of area and computational complexity
of digit recurrence algorithms [20-25, 27] is low due to the large number of iterations; therefore latency
(propagation delay) becomes high. While, some of the investigator rely on higher radix implementation
of digit recurrence algorithm [28, 29] to reduce the iteration, therefore the latency becomes improved
from the earlier reports [20-25], but these scheme additionally increases the hardware complexity.
Some other attractive ideas are based on functional iterations, like Newton-Raphson [27] and
Goldschmidt's [26] algorithm, that utilize multiplication techniques along-with the series expansion,
where the amount of quotient digits obtained per iteration is doubled. The drawback of these methods is
operands should be previously normalized, most used primitive are multiplications, and the remainder is
not directly obtained [30].
2.4 Works related to reciprocal
Reciprocal approximations and division plays a pivotal role for several applications like digital signal and
image processing, computer graphics, scientific computing etc. [31], because the division can be
computed by the following manner: the reciprocal of divisor is computed at first, and then it is used as
the multiplier in a subsequent multiplication with the dividend [32]. This method is especially economical
when different dividend is to be divided by the same devisor. Such ‘reciprocal approximation’ methods
are typically based on the Newton-Raphson iteration method [33]. Although, they are infrequent than
other two basic arithmetic operations, like addition and multiplication, due to poor performance (high
computation time) [34]. A lot of algorithm like Taylor series method [35], iterative techniques, such as
Newton-Raphson [32, 33], Gold-Schmidt [34], have been reported so far to implement such function.
Amalgam-nation with these basic algorithms substantial amount of work has been investigated in
various literatures [31-34]. The above mentioned algorithms for performing these operations have long
latencies or large area overhead and exhibit linear convergence rate thereby large number of operations
is required for above mentioned the task. Moreover the iterative division technique starts with an initial
approximation of the reciprocal of the divisor, usually implemented through a look-up table, thereby
large ROM size is required to accommodate the denominator leading to higher delay and power. The
desired precision level of the reciprocal unit is limited by the ROM size as the size of the lookup table
increases with the increasing precision level.
2.5 Works related to square root
Several methods have been used to perform the operation, as summarized in [38-61]. Among them is
the digit-recurrence technique in which one digit of the result is produced per cycle. To reduce the
number of cycles, it is convenient to use a high radix algorithm; however, the complexity of the selection
of the digit limits the direct application to radices up to eight. A comprehensive presentation of this
method is given in [43]. Another technique is based on functional iteration [46], [53], [57] in which the
implementation is based on the use of a multiplier. Although the convergence of these algorithms can
be quadratic, the time of a multiplication and the difficulty in producing a correctly rounded result makes
the execution time comparable with digit-recurrence algorithms. For division, very-high radix algorithms
have been proposed that simplify the selection function by prescaling of the divisor [38], [40], [41], [48],
[49], [52], [58], [59], [60]. In [44], it has been demonstrated that this technique, when combined with
selection by rounding, can be used to achieve a faster implementation than other known dividers,
including dividers by functional iteration. Further comparisons that confirm this claim are presented in
[43], [56]. The area of this implementation is larger than that for low-radix dividers, but it seems
reasonable for the number of transistors available in today's chips. Because of the similarities between
division and square root, combined implementations have been proposed [39], [42], [45], [47], [50], [51],
[52], [54], [61]. This motivates the development of an algorithm for square root which is similar to the
very-high radix division with prescaling. An algorithm of this type was presented in [52]. However, the
resulting implementation is complex and is not compatible with the corresponding division unit.
3. Scope of work
About a hundred years ago the researchers in the west discovered the Indian Vedas: ancient texts in
their millions containing some of the most profound knowledge. In fact the Sanskrit word ‘Veda’ means
‘knowledge’. Researchers found Vedic texts on medicine, architecture, astronomy, ethics etc., and
according to the Indian convention all knowledge is contained in these Vedas. Going through the
propaganda reports of “Vedic Mathematics” is an “amazingly compact and powerful system of
calculation”, and one also hears things like “once you have learnt the 16 sutras by heart, you can solve
any long problem orally”, and so on, is essentially a compilation of some tricks in simple arithmetic and
algebra. One of the main purposes of Vedic mathematics is to transform the tedious calculations into
simpler, verbally manageable operation without much help of pen and paper. Human can perform
mental operations for very small magnitude of numbers and hence Vedic mathematics provides
techniques to solve operations with large magnitude of numbers easily, and it provides more than one
methods for the solution of complex calculation like multiplication and division. For each operation there
is at least one generic method provided along with some methods which is directed towards specific
cases simplifying the calculations further. After intensive research and literature survey, we found that
algorithms based on these methods, implemented in hardware, improve the results.
4. PhD Plan
4.1
Work done so far
We have started Systematic study of the computational mathematics amalgamation with the application
towards the application specific processor design was completed. The hardware implementation of
decimal arithmetic is becoming a topic of interest to the researchers, for wide application of such
arithmetic in the field of human-centric applications, where exact results are required. Generally,
computer algorithms and architectures are based on binary number systems, because, of their simplicity
from its counterpart i.e. decimal number systems. However, many decimal numbers cannot be
represented exactly in binary format due to finite word-length effect, hence exact implementation is
impractical. Recently, decimal arithmetic is becoming commercialized for general purpose computer,
through Binary Coded Decimal (BCD) encoding techniques.
The summary of work done in last 1.5 year is as follow:
Reciprocal: Reciprocal approximations and division plays a pivotal role for several applications like
digital signal and image processing, computer graphics, scientific computing etc. [31].In algorithmic and
structural levels, a lot of reciprocal implementation techniques had been developed to reduce the
propagation delay (latency) of such algorithm; which reduces the iteration aiming to reduction of latency
but the principle behind the implementation methodology was same in all cases. Vedic Mathematics [36]
is the ancient system of Indian mathematics which has a unique technique of calculations based on 16
Sutras (Formulae). In a paper we report on a reciprocal algorithm and its architecture based on such
ancient Indian mathematics. Sahayaks (auxiliary fraction) is a Sanskrit term is adopted from Vedas;
formula is encountered to implement the reciprocal circuitry. By employing the Vedic methodology
reciprocal has been implemented by the transformation of the digits to a smaller digit and entire division
has been carried out through the transformed digits, therefore offered the reduction of circuit level
complexity, owing to the substantial reduction in propagation delay.
To carry-out the transistor level implementation of decimal reciprocal unit optimized 8421 BCD recoding
techniques [37] have been adopted in this literature. The reciprocal unit is fully optimized in terms of
calculations, so any configuration of input digit to calculate the reciprocal could be elaborated.
Transistor level implementation of such reciprocal circuitry was carried out by the combination of BCD
arithmetic along with Vedic mathematics, performance parameters like propagation delay, dynamic
switching power consumption calculation of the proposed method was calculated by spice spectre using
90nm CMOS technology and compared with the other design like Newton-Raphson (NR)[32] based
implementation. The calculated results revealed 5-digit reciprocal units have propagation delay only
~3.57us with ~30.8mW dynamic switching power.
Division: Division is a basic operation in many scientific and engineering applications. Generally,
division operation is a sequential type of operation, thereby, more costly in terms of computational
complexity and latency compared with other mathematical operations like multiplication and addition. In
algorithmic and structural levels, a lot of division techniques had been developed so far to reduce the
latency of the divider circuitry; which reduces the iteration aiming to reduction of latency but the principle
behind division was same in all cases.
Division algorithm and its architecture based on such ancient mathematics has been reported on a
paper. Paravartya- Yojayet (PY) is a ‘Sanskrit’ term indicates ‘transpose and apply’ from Vedas; and
implemented in the proposed division circuitry. By employing the Vedic methodology, division has been
implemented by multiplication and addition, thereby, reduces the iteration, owing to the substantial
reduction in propagation delay. Transistor level implementation of such division circuitry was carried out
by the combination of Boolean logic with Vedic mathematics. Performance parameters like propagation
delay, dynamic switching power consumption calculation of the proposed method was carried out by
spice spectre using 90nm CMOS technology and scaling down to 65nm, 45nm and 32nm technology.
Moreover, the proposed methodology has been compared with the other design like digit recurrence,
convergence and series expansion based implementation. The calculated results revealed (32÷16) bit
circuitry have propagation delay only ~23.5ns with ~35.7mW dynamic switching power.
Division: In another paper we report on a decimal division technique and its hardware implementation
based on such ancient Vedic mathematics. ‘Nikhilam Navatascaramam Dasatah (NND)’ is a Sanskrit
term indicates ‘all from 9 and last form 10’, is adopted from Vedas; formula is encountered to achieve
high speed decimal division circuitry. In this approach decimal divider was implemented through
complement of the divisor digit instead of actual one, addition and little-bit multiplication, thereby
reduces the iteration, owing to the substantial reduction in propagation delay. Algorithmic
implementation of such division circuitry was carried out by the amalgam-nation BCD arithmetic with
Vedic mathematics, performance parameters like propagation delay, dynamic switching power
consumption calculation of the proposed method was calculated by Xilinx ISE simulator and compared
with the other design like Digit-recurrence (D-R), Newton-Raphson (N-R) based implementation. The
calculated results revealed 6÷3-digit divisor circuitry have propagation delay only ~41 ns with ~93 mW
dynamic switching power.
Future work
4.2
Reported works

Division

Reciprocal
Future work

Some special computational techniques (Square root, inverse square root, etc.),

Floating point arithmetic (e.g. Multiplier, Divider),

Reversible arithmetic circuits,

Field logarithmic techniques,

Special base number system (Complex base: 1+j, 1-j).
Achievements and goals
4.3
Objective
no.
O1
O2
Task
No.
T1
T2
T3
T4
T5
T6
T7
O3
O4
O5
T8
T9
T10
T11
T12
T13
T14
Objective/Task
Background study of computational arithmetic
Course work
Literature review about reciprocal, divider
Course work
Proposal of an algorithm for divider
Proposal of algorithms for reciprocal
Proposal of an algorithm for improved division operation based on different Vedic
formula
Literature review about square root and inverse square root
Proposal of an algorithm for square root and inverse square root
Literature review about reversible circuits
Design reversible arithmetic circuits
Literature review about Field logarithmic techniques and Special base number system
Proposal of an algorithm for Field logarithmic techniques
Proposal of some algorithms for operations on special base number system
Gantt chart of PhD Research Plan
The Gantt chart given below shows the timeline over seven semesters.
Objectives
Work plan
Autumn,
2013
O1
Spring,
2014
Autumn,
2014
Spring,
2015
Autumn,
2015
Spring,
2016
Autumn,
2016
T1
T2
T3
T4
O2
T5
T6
T7
O3
T8
T9
O4
T10
T11
T12
O5
T13
T14
Final Thesis
Writing
Thesis
writing
5. Conclusions
Computational mathematics like multiplication, division, square, square root, reciprocal and other basic
mathematical operations play a pivotal role in the field of digital signal processing, image processing,
computer graphics, application-specific (embedded) system, cryptography etc. All these operations are
usually implemented in software but may use special purpose hardware for speed. Although computers
keep getting faster and faster per year, there are always new applications that need more processing
than is available. To meet the demands of these and future applications, we need to develop new
techniques (algorithms) for accelerating applications on commercial hardware. Ancient mathematics,
especially Vedic mathematics, contains the methods developed for fast mental calculations. Threadbare
study of the literature of Vedic mathematics indicates that the procedure (Vedic) may be fruitful in
algorithmic level as well as VLSI implementation for the above mentioned circuits.
6. References
[1] M. M.-Dastjerdi, A. A.-Kusha, and M. Pedram, “BZ-FAD: A Low-Power Low-Area Multiplier Based on
Shift-and-Add Architecture,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 2,
pp. 302-306, Feb. 2009.
[2] A.D. Booth, “A signed binary multiplication technique,” Quarterly Journal of Mechanics and Applied
Mathematics, vol. IV, pp. 236–240, 1952.
[3] J. Hu, L. Wang, and T. Xu, “A low-power adiabatic multiplier based on modified booth algorithm,” in
Proc. of the IEEE Int. Symp. on Integrated Circuits, Singapore, Sept. 2007, pp. 489-492.
[4] C. S. Wallace, “A suggestion for a fast multiplier,” IEE Trans. Electronic Computer, vol. EC-13, no. 1, pp.
14–17, Jan. 1964.
[5] Z. Wang, G.A. Jullien, and W.C. Miller, “A New Design Technique for Column Compression Multipliers”,
IEEE Trans. on Computers, vol. 44, no. 8, pp. 962-970, Aug.1995.
[6] M. Ramalatha, K.D. Dayalan, P. Dharani, and S.D. Priya, “High Speed Energy Efficient ALU Design
using Vedic Multiplication Techniques, In Proc. of the IEEE, Int. Conf. on Advances in Computational
Tools for Engineering Applications, Zouk Mosbeh, July 2009, pp. 600-603.
[7] P. Mehta, and D. Gawali, “Conventional versus Vedic mathematical method for Hardware
implementation of a multiplier,” in Proc. of the IEEE, Int. Conf. on Advances in Computing, Control, and
Telecommunication Technologies, Trivandrum, Kerala, Dec. 2009, pp. 640-642.
[8] H.D. Tiwari, G. Gankhuyag, C.M. Kim, and Y.B. Cho, “Multiplier design based on ancient Indian Vedic
Mathematics,” in Proc. of the IEEE, Int. SoC Design Conf., Busan, Nov. 2008, pp. 65-68.
[9] Institute of Electrical and Electronics Engineers. “Standard for Binary Floating-Point Arithmetic”, IEEE
Std 754-1985.
[10] M. F. Cowlishaw, “Densely packed decimal encoding.” IEEE Proceedings - Computers and Digital
Techniques, vol. 149, pp. 102-104. May 2002.
[11] F.Y. Busaba, C.A. Krygowski, W.H. Li, E.M. Schwarz, S.R. Carlough, “The IBM z900 Decimal Arithmetic
Unit,” in Proceedings of the 35th Asilomar Conference on Signals, Systems and Computers, vol. 2, pp
1335, IEEE Computer Society, November 2001.
[12] M. A. Erle, J. M. Linebarger, and M. J. Schulte, “Potential Speedup Using Decimal Floating-Point
Hardware.” Proceedings of the Thirty Sixth Asilomar Conference on Signals, Systems, and Computers
Pacific Grove, California. IEEE Press, pp. 1073-1077, November, 2002.
[13] M. A. Erle, M. J. Schulte, “Decimal Multiplication Via Carry-Save Addition.” Proceedings of the IEEE
International Conference on Application-Specific Systems, Architectures, and Processors The Hague,
Netherlands. IEEE Computer Society Press, pp. 348-358, June, 2003.
[14] G. Bohlender and T. Teufel “A Decimal Floating-Point Processor for Optimal Arithmetic”, pp 31-58.
Computer Arithmetic: Scientific Computation and Programming Languages, Teubner Stuttgart, 1987.
[15] M. F. Cowlishaw. “Decimal Floating-Point : Algorism for Computers,” in Proceedings of the 16th IEEE
Symposium on Computer Arithmetic, pp. 104-111, June 2003.
[16] L Wang. “Processor Support for Decimal Floating-Point Arithmetic.” Technical Report. Electrical and
Computer Engineering Department. University of Wisconsin-Madison. Available upon request.
[17] M. Langhammer, “Improved subtractive division algorithm,” in Proc. IEEE Int. ASIC Conf. 1998,
Rochester, NY, Sept. 13-16, 1998, pp. 343-347.
[18] R. Hagglund, P. Lowenborg, M. Vesterbacka, “A polynomial-based division algorithm,” in Proc. IEEE Int.
Symp. on Circuits and Systems, May 26-29, 2002, vol. 3, pp. 571-574.
[19] S-G. Chen, C-C. Li, “An efficient division algorithm and its architecture,” in Proc. IEEE Int. Computer,
Communication, Control and Power Engineering Conf. 1993, Beijing , China, Oct. 19-21, 1993,vol 1, pp.
24-27.
[20] W.P. Marnane, S.J. Bellis and P. Larsson-Edefors, “Bit-serial interleaved high speed division,”
Electronics Letter, vol. 33, no. 13, pp. 1124-1125, June-1997.
[21] H. T. Vergos, “An Efficient BIST Scheme for Non-Restoring Array Dividers,” in Proc. IEEE 10th
Euromicro Conf. on Digital System Design Architectures, Methods and Tools 2007, Lubeck, Aug. 29-31,
2007, pp. 664-667.
[22] J.B. Andersen, A.F. Nielsen, and O. Olsen, “A Systolic ON-LINE Non-restoring Division Scheme,” in
Proc. IEEE Twenty-Seventh Hawaii Int. Conf. on System Sciences, Wailea, USA, Jan. 4-7, 1994, pp.
339-348.
[23] N. Aggarwal, K. Asooja, S.S. Verma, and S. Negi, “An Improvement in the Restoring Division Algorithm
(Needy Restoring Division Algorithm),” in Proc. IEEE Int. Conf. Computer Science and Information
Technology 2009, Beijing, Aug. 8-11, 2009, pp. 246-249.
[24] P. Nair, D. Kudithipudi, and E. John, “Design and Implementation of a CMOS Non-Restoring Divider,” in
Proc. IEEE Int. Region 5 Conf, San Antonio, USA April 7-9, 2006, pp. 211-217.
[25] C. Senthilpari, S. Kavitha and J. Joseph, “Lower delay and area efficient non-restoring array divider by
using Shannon based adder technique,” in Proc. IEEE Int. Conf. on Semiconductor Electronics, Melaka,
June 28-30, 2010, pp. 140-144.
[26] S.F. Oberman, and M.J. Flynn, “Division Algorithms and Implementations,” IEEE Trans. on Comp. vol.
46, no. 8, pp. 833-854, Aug. 1997.
[27] J-P. Deschamps, G.J.A. Bioul, G.D. Sutter, Synthesis of Arithmetic Circuits, FPGA, ASIC and
Embedded System, John Wiley & Sons, Inc., publication, 2006.
[28] T. M. Carter and J.E. Robertson. “Radix -16 signed-digit division,” IEEE Trans. Computers, vol. C-39,
no. 12, pp. 1424-1433, Dec. 1990.
[29] H. R. Srinivas and K. K. Parhi, “A Fast Radix 4 Division Algorithm,” in Proc. IEEE Int. Symp. on Circuit &
Systems, London, May 30-June. 02, 1994,vol. 4, pp. 311-314.
[30] G. Sutter, J-P. Deschamps, “High speed fixed point dividers for FPGAs,” in Proc. IEEE Int. Conf. on
Field Programmable Logic and Applications, Prague, Aug. 31-Sept. 02, 2009, pp. 448-452.
[31] M.J. Schulte, J.E. Stine and K.E. Wires, “High-speed Reciprocal Approximations,” Proc. of the ThirtyFirst Asilomar Conference on Signals, Systems & Computers, vol. 2, 1997, pp. 1183-1187.
[32] D. Chen,and S.-B. Ko, “Design and Implementation of Decimal Reciprocal Unit,” in Proc. Canadian Conf.
on Electrical and Computer Engineering, 2007, pp. 1094-1097.
[33] D.L. Fowler and J. E. Smith, “An Accurate, High Speed Implementation of Division by Reciprocal
Approximation,” in Proc. 9th Symp. on Computer Arithmetic, 1989, pp. 60-67.
[34] J.-A. Pineiro and J.D. Bruguera, Member, “High-Speed Double-Precision Computation of Reciprocal,
Division, Square Root, and Inverse Square Root”, IEEE Trans. on Computers, vol. 52, no. 12, pp. 13771388, Dec. 2002.
[35] P.M. Farmwald, “High Bandwidth Evaluation of Elementary Functions”, Proc. Fifth IEEE Symp.
Computer Arithmetic, pp. 139- 142, 1981.
[36] J.S.S.B.K.T. Maharaja, “Vedic mathematics”, Motilal Banarsidass Publishers Pvt Ltd, Delhi, 2001.
[37] J. Bhattacharya, A. Gupta, and A. Singh, “A high performance binary to BCD converter for decimal
multiplication”, in Proc. of the IEEE, International Symp. on VLSI Design Automation and Test
(VLSIDAT), Hsin Chu, 2010, pp. 315-318.
[38] W.S. Briggs and D.W. Matula, “Method and Apparatus for Performing Division Using a Rectangular
Aspect Ratio Multiplier”, U. S. Patent No. 5 046 038, Sept. 1991.
[39] J. Cortadella and T. Lang, “High-Radix Division and Square Root with Speculation”, IEEE Trans.
Computers, vol. 43, no. 8, pp. 919-931, Aug. 1994.
[40] M.D. Ercegovac, “A Division with Simple Selection of Quotient Digits”, Proc. Sixth IEEE Symp. Computer
Arithmetic, pp. 94-98, Aahrus, Denmark, June 1983.
[41] M.D. Ercegovac and T. Lang, “Simple Radix-4 Division with Operands Scaling”, IEEE Trans. Computers,
vol. 39, no. 9, pp. 1,204-1,208, Sept. 1990.
[42] M.D. Ercegovac and T. Lang, “Module to Perform Multiplication,Division and Square Root in Systolic
Arrays for Matrix Computations”, J. Parallel and Distributed Computing, vol. 11, no. 3, pp. 212-221, Mar.
1991.
[43] M.D. Ercegovac and T. Lang, “Division and Square Root: Digit-Recurrence Algorithms and
Implementations”. Kluwer Academic, 1994.
[44] M.D. Ercegovac, T. Lang, and P. Montuschi, “Very-High Radix Division with Prescaling and Selection by
Rounding”, IEEE Trans. Computers, vol. 43, no. 8, pp. 909-918, Aug. 1994.
[45] J. Fandrianto, “Algorithm for High Speed Shared Radix 8 Division and Radix 8 Square-Root”, Proc. Ninth
IEEE Symp. Computer Arithmetic, pp. 68-75, Santa Monica, Calif., Sept. 1989.
[46] M.J. Flynn, “On Division by Functional Iteration”, IEEE Trans. Computers, vol. 19, no. 8, pp. 702-706,
Aug. 1970.
[47] J.B. Gosling and C.M.S. Blakeley, “Arithmetic Unit with Integral Division and Square Root”, IEEE Proc.,
Part E, vol. 134, pp. 17-23,Jan. 1987.
[48] J. Klir, “A Note on Svoboda's Algorithm for Division”, Information Processing Machines (Stroje na
Zpracovani Informaci), no. 9, pp. 35-39, 1963.
[49] E.V. Krishnamurthy, “On Range-Transformation Techniques for Division”, IEEE Trans. Computers, vol.
19, no. 2, pp. 157-160, Feb. 1970.
[50] S.E. McQuillan and J.V. McCanny, “VLSI Module for High Performance Multiply, Square Root and
Divide”, IEEE Proc., Part E, vol. 139, no. 6, pp. 505-510, June 1992.
[51] S.E. McQuillan, J.V. McCanny, and R. Hamill, “New Algorithms and VLSI Architectures for SRT Division
and Square Root”, Proc. 11th IEEE Symp. Computer Arithmetic, pp. 80-86, Windsor, Ontario, Canada,
July 1993.
[52] D.W. Matula, “Highly Parallel Divide and Square Root Algorithms for a New Generation Floating Point
Processors”, Proc. SCAN-89 Symp. Computer Arithmetic and Self-Validating Numerical Methods, Oct.
1989.
[53] P. Montuschi and M. Mezzalama, “Survey of Square Rooting Algorithms,” IEEE Proc., Part E, vol. 137,
no. 1, pp. 31-40, Jan. 1990.
[54] P. Montuschi and L. Ciminiera, “Reducing Iteration Time When Result Digit is Zero for Radix-2 SRT
Division and Square Root with Redundant Remainders”, IEEE Trans. Computers, vol. 42, no. 2, pp. 239246, Feb. 1993.
[55] S.F. Oberman and M.J. Flynn, “Design Issues in Division and Other Floating-Point Operations”, IEEE
Trans. Computers, vol. 46, no. 2, pp. 154-161, Feb. 1997.
[56] S.F. Oberman and M.J. Flynn, “Division Algorithms and Implementations, “ IEEE Trans. Computers, vol.
46, no. 8, pp. 833-854, Aug. 1997.
[57] C.V. Ramamoorthy, J.R. Goodman, and K.H. Kim, “Some Properties of Iterative Square-Rooting
Methods Using High-Speed Multiplication”, IEEE Trans. Computers, vol. 21, no. 8, pp. 837-847, Aug.
1972.
[58] A. Svoboda, “An Algorithm for Division”, Information Processing Machines, vol. 9, pp. 25-32, 1963.
[59] C. Tung, “A Division Algorithm for Signed-Digit Arithmetic”, IEEE Trans. Computers, vol. 17, no. 9, pp.
887-889, Sept. 1970.
[60] D.C. Wong and M.J. Flynn, “Fast Division Using Accurate Quotient Approximations to Reduce the
Number of Iterations”, IEEE Trans. Computers, vol. 41, no. 8, pp. 981-995, Aug. 1992.
[61] J.H.P. Zurawski and J.B. Gosling, “Design of a High-Speed Square Root Multiply and Divide Unit”, IEEE
Trans. Computers, vol. 36, no. 1, pp. 13-23, Jan. 1987.
7. Publications of the author related to the proposed work
Journals:
[1]
P. Saha, D. Kumar, P. Bhattacharyya, and A. Dandapat, “Vedic division methodology for high-speed very
large
scale
integration
applications”,
IET
Journal
of
Engineering
2014,
pp.
1-9.
DOI: 10.1049/joe.2013.0213 , Online ISSN 2051-3305
[2]
P. Saha, D. Kumar, P. Bhattacharyya, and A. Dandapat, “Design of 64-bit squarer based on Vedic
mathematics,” Journal of Circuits Systems and Computers 2014. vol. 23, no.7, pp. xx, 2014.ISSN: 0218-1266,
DOI: 10.1142/S0218126614500923.
[3]
P. Saha, D. Kumar, P. Bhattacharyya, and A. Dandapat, “Improved Division Algorithm using Vedic
Mathematics for VLSI Applications”, Microelectronics Journal. (communicated)
Conference papers:
[1] P. Saha, D. Kumar, P. Bhattacharyya, A. Dandapat, “Reciprocal Unit Based on Vedic Mathematics for Signal
Processing Applications”, IEEE International Symposium on Electronic System Design, 2013, pp. 41-45.
Digital Object Identifier: 10.1109/ISED.2013.15
Download