Instruction Set Extensions for Computation on Complex Floating

advertisement
Instruction Set Extensions for
Computation on Complex
Floating Point Numbers
Authors:
Email:
Philipp Digeser, Marco Tubolino , Martin Klemm, Daniel Shapiro, Axel Sikora and Miodrag Bolic
{digeserp, tubolinm, klemmm, sikora}@dhbw-loerrach.de
{dshap092, mbolic}@site.uottawa.ca
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Overview
•
•
•
•
•
•
•
•
•
•
Prior Art
Complex Floating Point Division
Instruction Set Extensions (ISE)
Instruction Hardware
Software Interface
Experiment
Performance Evaluation
Hardware Resource Utilization
Future Work
Conclusion
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Prior Art
• We described the possibility of accelerating
scientific observation using ISEs instead of
software libraries such as carith
• In this work we demonstrated this possibility
• The extension of our prior work can perform
several operations (complex
addition/subtraction/multiplication/division)
which improves the chances of our ISE being
widely applicable.
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Complex Floating Point Computations
• Unlike real multiplication or division, mathematical
operations for complex numbers are usually provided
by slow software. Consider complex division:
A+jB
Slow
E+jF =
C+jD
(A+jB)∙(C−jD) AC+BD BC−AD
E+jF=
=
+j
(C+jD)∙(C−jD)
C²+D²
C²+D²
• 3 Additions/Subtractions
• 6 Multiplications
• 2 Divisions
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Complex Floating Point Computations
• Fast complex computations are necessary
– Image and audio manipulation
– Multi-antenna
– Correlation
– Others
• Example: STSDAS offers math libraries for
image analysis, including
stsdas.analysis.fourier.carith, which is used to
multiply or divide two complex images [1].
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Instruction Set Extension
• Instruction-Set Extensions,
as the name implies,
involves the addition of
custom instructions to a
processor’s instruction set
Generic custom instruction datapath [2]
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Instruction Set Extension
• An ISE candidate has limited I/O
access to the register file.
• We use multicycle reads/writes
from/to the register bank in order
to squeeze several operands into
the two input-one-output register
file [4]
• The computations can be
distributed to one adder, one
multiplier and one divider
• They can be pipelined
• In case of divide by zero and
overflow flags are set
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Original custom logic block [3]
Instruction Hardware
AC+BD BC−AD
E+jF=
+j
C²+D²
C²+D²
Operation when n=0 above, n=1 at right.
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Software Interface
• The designed hardware for complex division can
be used easily in assembly (by inline) or C/C++
code as shown below:
ALT_CI_COMPLEX_CORE_INST(0, in_A, in_C);
out_real = ALT_CI_COMPLEX_CORE_INST(1, in_B, in_D);
out_imag = ALT_CI_COMPLEX_CORE_INST(0, 0, 0);
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Experiment
• h(u,v) is some blurred picture taken by a telescope
– Motion blurring: long exposure time and moving of the camera. E.g.
hubble
• g(u,v) illustrates the image aimed to be recovered
• f(u,v) the failure, called a point spread function, can be
calculated out of the known movement of the target
h(u,v)
f(u,v)
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
g(u,v)
Experiment
• To restore the image, they must be transformed into the freq.
domain by applying an FFT and back using IFFT
• This transformation leads to complex arrays in the freq.
domain that need to be divided:
f(u,v) ∗g(u,v)=h(u,v)
G(u,v)=H(u,v)/F(u,v)
h(u,v)
f(u,v)
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
g(u,v)
Performance Evaluation
• Size: 256x256 Pixel
OriginalHW−Overhead
• Speedup=
AcceleratedHW−Overhead
Approach
Execution Time
(seconds)
Loop Overhead
(seconds)
Speedup
SW division
ISE accelerated division
9.17673
0.77180
0.02258
0.02258
12.2182
SW multiplication
ISE accelerated multiplication
6.41827
0.76075
0.02273
0.02273
8.6651
SW addition
ISE accelerated addition
2.50610
0.74385
0.02259
0.02259
3.44344
SW subtraction
ISE accelerated subtraction
2.58661
0.74477
0.02260
0.02260
3.55442
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Hardware Resource Utilization
• Considerable
• The entire system requires 8864 Logic
Elements and 27 9-Bit DSP units
• The complex core requires 2520 Logic
Elements and 23 9-Bit DSP units
• Optimizing the ISE hardware to maximize
reuse was essential to limiting the hardware
size
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Future Work
• Adding FFT and IFFT
• To accelerate other embedded complex
mathematics algorithms
• Correlation of pictures
– Instead of doing a slow time domain correlation
– Heavy complex multiplication in freq. domain
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Conclusion
• The designed ISE can be used to accelerate
embedded complex mathematics operations
• Significant Speedup (up to 12)
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Questions?
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
References
[1] Space Telescope Science Institute. (2010) carith. [Online]. Available:
http://stsdas.stsci.edu/cgi-bin/gethelp.cgi?carith.hlp
[2] ALTERA Corperation. (2007) Nios II custom instruction user guide.
[Online]. Available: http://www.altera.com/literature/tt/tt nios2
multiprocessor tutorial.pdf
[3] P. Digeser, M. Tubolino, M. Klemm, D. Shapiro, and M. Bolic,
“Instruction set extension in the NIOS II: A floating point divider for
complex numbers,” in CCECE, 2010.
[4] L. Pozzi and P. Ienne, “Exploiting pipelining to relax register-file port
constraints of instruction-set extensions,” in CASES ’05:
Proceedings of the 2005 international conference on Compilers,
architectures and synthesis for embedded systems. New York, NY,
USA: ACM, 2005, pp. 2–10.
IEEEI 2010
ISE for Computation on Complex Floating Point Numbers
Download