A Floating Point Divider for Complex Numbers in the NIOS II Presented by John-Marc Desmarais Authors: Philipp Digeser, Marco Tubolino , Martin Klemm, Daniel Shapiro and Miodrag Bolic Email: {dshap092, mbolic}@site.uottawa.ca CARG 2010 Overview Floating point division Instruction Set Extensions (ISE) NIOS II processor Instruction hardware Software interface Experiment Conclusion carg.site.uottawa.ca CARG 2010 Floating Point Division Unlike real multiplication or real division, mathematical operations for complex numbers are usually provided by slow software. Consider complex division: Slow carg.site.uottawa.ca CARG 2010 Floating Point Division • Fast complex dividers are necessary to drive an increasing number of applications such as signal processing systems for image and audio manipulation, GPS, and multi-antenna systems. • Example: STSDAS offers math libraries for image analysis, including stsdas.analysis.fourier.carith, which is used to multiply or divide two complex images1. 1http://stsdas.stsci.edu/cgibin/gethelp.cgi?carith.hlp carg.site.uottawa.ca CARG 2010 Instruction Set Extensions ISE (Instruction Set Extensions) Instruction-Set Extensions, as the name implies, involves the addition of custom instructions to a processor’s instruction set. Many market processors allow for the addition of these internal custom instructions: 1. Tensilica Xtensa (VLIW) 2. Altera NIOS II 3. Xilinx Microblaze 4. MIPS CorExtend In recent years there has been much research into the area of automatic identification of Instruction Set Extensions. carg.site.uottawa.ca CARG 2010 Instruction Set Extensions These automated efforts vary in their approach. Some look at the functional C level of the program where hotspot functions are identified. Others look lower at the basic construct of the program as data and control flow graphs. ISE (Instruction Set Extensions) z y + Modify ISA Add Custom Hardware Modify Compiler, ASM & LD Regenerate Custom Program x / >> carg.site.uottawa.ca CARG 2010 Instruction Set Extensions • An ISE candidate has limited IO access to the register file. Possible Remedies: Solution (Pozzi05): 1. 2. 3. 4. 5. We use multicycle reads/writes from/to the register bank in order to squeeze several operands into the two inputone-output register file. Multiport Register File Register File Replication Shadow Registers Multicycle Reads (Altera’s NIOS II) Dedicated Data Links (Microblaze) • The instruction width also poses an IO barrier. opcode 31 rs 26 25 rd rt 21 20 carg.site.uottawa.ca 16 15 funct shamt 11 10 6 5 0 CARG 2010 NIOS II Processor Generic custom instruction datapath carg.site.uottawa.ca Our custom logic block CARG 2010 Instruction Hardware Cycles We can see in these figures that a sequence of three calls to the custom instruction results in a complex operation with four inputs and two outputs. carg.site.uottawa.ca CARG 2010 Instruction Hardware Operation when n=0 above, n=1 at right. carg.site.uottawa.ca CARG 2010 Software Interface The designed hardware for complex division can be used easily in assembly (by inline) or C/C++ code as shown below: carg.site.uottawa.ca CARG 2010 Experiment We used a NIOS II processor and a PLL as the starting point for the design. carg.site.uottawa.ca CARG 2010 Experiment carg.site.uottawa.ca CARG 2010 Conclusion Applications can be accelerated with instruction set extensions, and complex division is one case where there is a tangible benefit. • We designed a complex divider instruction set extension for the NIOS II • This instruction was able to accelerate the execution of code that uses complex division • In the future we would like to implement additional complex operations, and publish the core on OPENCORES.org carg.site.uottawa.ca