Fast Fourier Transform Implementation on FPGA Using Soft-Core Processor NIOS II

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 8 - Apr 2014
Fast Fourier Transform Implementation on FPGA Using Soft-Core
Processor NIOS II
Poonam S. Isasare1, Mahesh T. Kolte2
1
2
Student of Department of Electronics and Telecommunication, University of Pune, Pune, Maharashtra, India
Professor & Head of Department of Electronics and Telecommunication, University of Pune, Pune, Maharashtra,
India
Abstract- FPGAs with soft-core processors offer the
opportunity for testing & implementation various trade-offs
between hardware and software implementations of the
functions to implement. With the Altera NIOS II the
processor can be customized through the addition of new
instructions. Custom specific functions can be implemented
as coprocessors. Altera has provided a C to Hardware
compiler that can be used to speed-up some C functions
within a C program using hardware acceleration block.
While working with C2H compiler, we need to write code in
C language and then as per our need we can accelerate that
block of code over hardware. Design tool automatically
integrates that block with NIOSII processor. This
methodology saves lot of time required for implementation
and validation of the complex design. In this work we
present a preliminary performance evaluation of the C2H
compiler on FFT algorithm. We compare the compiler
results with results with HDL language and calculate the
time efficiency. After code transformation, speedups between
6 and 10 have been obtained. For loops with a recurrence, a
speedup greater than 2 has been obtained; we show the basic
C transformations that provide the best C2H results. In this
work we use of the C2H Altera compiler for the automatic
VHDL synthesis of FFT algorithm.
Keyword- FFT, C2H, CYCLONE III, DMA, FPGA
I.
INTRODUCTION
x (k) =
∑
( )
for k= 0, 1,……………….., N-1
Where the twiddle factor
root of unity is defined as,
=
/
denotes the N point primitive
For k= 0, 1,……………….., N-1
In this work FFT algorithm is implemented on Soft Core
Processor which is Altera Implemented processor Nios II Soft
processor on Cyclone III FPGA. We implement FFT
algorithm in embedded C language and then we use hardware
acceleration by using Altera C2H compiler & DMA this will
maximize the throughput in terms of clock cycle.
The remaining paper is organized as follows Section II
presents the System design for FFT algorithm implementation.
Section III provides FFT IMPLEMENTATION technique.
The comparison results are discussed in IV. Concluding
remarks are given in Section V.
II.
SYSTEM DESIGN
The architecture of FFT algorithm implementation is
shown as below
FFT algorithm was proposed by Cooley and Turkey in
1965. FFT is the basis of Digital Signal Processing (DSP).
FFT is the bottleneck in some DSP applications. There are
some FFT algorithms have been developed, such as radix-2
algorithms, radix-2m algorithms, Split-based FFT algorithm,
Prime Factor algorithm, Winograd Fast Fourier Transform
Algorithm (WFTA), and Fast Hartley Transform algorithm
(FHT).
The Discrete Fourier Transform (DFT) is the most
straightforward mathematical method for determining the
frequency content X (k) of a time-domain sequence x(n) . The
N-point DFT is defined as:
ISSN: 2231-5381
http://www.ijettjournal.org
Fig 1: System design of FFT
Page 411
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 8 - Apr 2014
1.
FFT UNIT
The fast Fourier transform (FFT) is a highly efficient
method for calculating the discrete Fourier transform
(DFT). The DFT is used in signal processing applications
for a range of purposes, such as analyzing the frequency
components of signals and data compression. The DFT is
a computationally intensive function. A naïve (non-FFT)
implementation of an n-point DFT requires n2 complex
multiplications. These are some basic facts about the FFT
algorithm to be aware of:
 The FFT operates on complex data. It performs
calculations simultaneously on real and imaginary
components of the data. The algorithm implements
complex multiplication as four multiplications, one
addition and one subtraction.
 One of the fundamental operations in the FFT
algorithm is the butterfly calculation. The butterfly
calculation either breaks a larger DFT into smaller
DFTs, or recombines smaller DFTs into a larger. The
name butterfly comes from the shape of the dataflow
diagram describing the operation.
 The FFT function uses a technique called bit reversal
to rearrange the input points so that the outputs are in
the correct order.
 Conventional software FFT implementations obtain
some of their speed by pre-calculating sine and
cosine terms used in the butterfly calculations. These
sine and cosine terms are called twiddle factors.
2. NIOS II PROCESSOR
The Nios II processor is general purpose 32 bit RISC soft
core processors. The soft core processors are generally
implemented in VHDL, verilog etc. and it can downloaded in
any FPGA hardware, it can implement on many parallel
processor on FPGA. The Nios II processor can be used with a
various components to form a complete system. Nios II
processors have full 32-bit instruction set, data path, address
space and general purpose registers. It interfaces needed to
connect to other chips on the FPGA board . These components
are interconnected Avalon Switch Fabric. Memory blocks in
the Cyclone III device.
This memory block provides on chip memory to soft core
processor. A JTAG UART interface is used to connect to the
circuitry that provides a Universal Serial Bus to the host
computer through which FPGA board is connected. This
software is called the USB-Blaster. Another module, called
the JTAG Debug module, is provided to allow the host
computer to control the Nios II processor.
3.
SYSTEM INTERCONNECT AVALON SWITCH
FABRIC BUS
The Avalon bus is used to connect peripherals and
processor of a system. . It is asynchronous bus system
ISSN: 2231-5381
which master slave components in which processor can
initiate bus transfers which acts as a master and memory
which is a slave component only accepts transfers
initiated by the processors. No. of masters and slaves are
allowed on system interconnect Avalon switch Fabric
Bus. As per the priority master gets access to the slave.
The user defined master-slave connections and arbitration
priorities bus logic is generated automatically. Arbitration
is based on a slave-side arbitration scheme. If both
instruction and data master of the Nios processor connect
to a single master, for improved performance, the data
master should be assigned a higher arbitration priority.
Since Altera FPGAs do not support tri-state buffers for
implementation of general logic, multiplexers are used to
route signals between masters and slaves. Although
peripherals may reside on or off-chip, all bus logic is
implemented on-chip. The Avalon bus is not a shared bus
structure. Each master-slave pair has a dedicated
connection between them, so multiple masters can
perform bus transactions simultaneously, as long as they
are not accessing the same slave. The Avalon bus
provides several features to facilitate the use of simple
peripherals. Peripherals that produce interrupts only need
to implement a single interrupt request signal. The
Avalon bus logic automatically forwards the interrupt
request to the master, along with the r defined at design
time. Arbitration logic also handles interrupt priorities
when multiple peripherals request an interrupt from a
single master, so the interrupt with the highest priority is
forwarded first. Separate data, address, and control lines
are used, so the peripherals do not have to decode address
and data bus cycles. I/O peripherals are memory mapped.
Address mapping is defined at design time.
4. SDRAM CONTROLLER
The SDRAM controller core support for double data rate ,
and low-power DDR2 (LPDDR2) SDRAM. The SDRAM
controller provides high performance data accessed runtime programmability. This controller with Avalone
provides the AVLONE memory mapped interfaced. The
controller reorders data to reduce row conflicts and bus
turn-around time by grouping read and write transactions
together, allowing for efficient traffic patterns and
reduced latency.
5.
ALTERA NIOS II JTAG DEBUG MODULE
The Nios II a supports a JTAG debug module which
gives on-chip emulation features to control the
processor from a host PC. PC-based software
debugging tools communicate with the JTAG debug
module and provide facilities, such as the following
features:
 Downloading programs to memory
 Starting and stopping execution
 Setting breakpoints and watch points
 Analyzing registers and memory
http://www.ijettjournal.org
Page 412
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 8 - Apr 2014

Collecting real-time execution trace data
the CPU’s software integrated development environment
(IDE). By using DMA The Nios II C2H Compiler also
provides a unique solution with full support for pointers
and array accesses. This is possible due to the integration
with SOPC Builder, which gives the accelerated function
access to the same memory map that it had when running
in software. This is also necessary for easy transfer of
data between the accelerator and the CPU, as well as the
other peripherals in the system.
FFT Using DMA can improve the performance of some
hardware/software applications. Large number of memory
required for processor to execute the butterfly operation
which is a time consuming process.
Fig 2 JTAG debug module
III.
1.
2.
IV.
REQUIREMENT OF TOOL
HARDWARE:
 Altera Cyclone III FPGA board
SOFTWARE
 Altera Quartus II software
 Altera C2H compiler
 Altera Nios II embedded design suit
SYSTEM IMPLEMETATION TECHNIQUE
1. SOFTWARE IMPLEMENTATION
In this method with help of soft processor NiosII IDE we
had Implemented FFT Algorithm with the help of C2H
compiler and that will run on FPGA board we initialize
the counter for this algorithm and we will get number of
clock cycle for software approach.
2. HARDWARE ACCELERATION USING DMA
The Nios II C2H Compiler automates a significant
portion of the design flow by generating coprocessors that
offload and enhance performance of a microprocessor
running software written in pure ANSI C. It is tightly
integrated into the software build flow and SOPC Builder
system generation tool. The tool automatically integrates
the accelerator into the hardware and software projects,
providing a pure-software development environment for
managing hardware/software partitioning.
The Nios II C2H Compiler uses SOPC Builder to connect
the accelerator to the processor and any other peripherals
in the system. This gives the accelerator direct access to a
memory map identical to that of the CPU, allowing
seamless support for pointers and arrays when migrating
from software to hardware. The GUI for the compiler is
ISSN: 2231-5381
.3
SOPC builder system design
Fig. 4 Accelerator integration and interface
http://www.ijettjournal.org
Page 413
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 8 - Apr 2014
V.
REFERENCES
RESULT
Hello from Nios II!
--Performance Counter Report-Total Time: 8.308E-05 seconds (4154 clock-cycles)
+---------------+-----+-----------+---------------+-----------+
| Section
| % | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----+-----------+---------------+-----------+
|Hardware + DMA | 99.8| 0.00008|
4144|
1|
+---------------+-----+-----------+---------------+-----------+
Real data index 0 = 36, Imaginary Data index 0 = 0
Real data index 1 = -40, Imaginary Data index 1 = 96
Real data index 2 = -40, Imaginary Data index 2 = 40
Real data index 3 = -40, Imaginary Data index 3 = 16
Real data index 4 = -40, Imaginary Data index 4 = 0
Real data index 5 = -40, Imaginary Data index 5 = -16
Real data index 6 = -40, Imaginary Data index 6 = -40
Real data index 7 = -40, Imaginary Data index 7 = -96
--Performance Counter Report-Total Time: 0.0001295 seconds (6475 clock-cycles)
+---------------+-----+-----------+---------------+-----------+
| Section
| % | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----+-----------+---------------+-----------+
|Software Only | 99.5| 0.00013|
6445|
1|
+---------------+-----+-----------+---------------+-----------+
Real data index 0 = 36, Imaginary Data index 0 = 0
Real data index 1 = -40, Imaginary Data index 1 = 96
Real data index 2 = -40, Imaginary Data index 2 = 40
Real data index 3 = -40, Imaginary Data index 3 = 16
Real data index 4 = -40, Imaginary Data index 4 = 0
Real data index 5 = -40, Imaginary Data index 5 = -16
Real data index 6 = -40, Imaginary Data index 6 = -40
Real data index 7 = -40, Imaginary Data index 7 = -96
VI.
[1] Claudio Brunelli, Roberto and Airoldi, Jari Nurmi, "Implementation and
Benchmarking of FFT algorithms on multicore platforms" IEEE Tans.
Aug2010
[2] Wang Xu, Zhang Yan and Ding Shunying, "A high performance FFT
liabrary with single instruction multiple data (SIMD) ARCHITECTURE",
IEEE trans. Aug2011
[3] Cyclone literature
http://www.altera.com/literature/hb/cyclonev/cv_54008.pdf
[4] JTAG debug module
http://www.altera.com/devices/processor/nios2/benefits/ni2-jtag-debug.html
[5]Altera Corp, Quartus® II Version 6.0 Handbook, Volume 4: SOPC
Builder, Altera Corp., San Jose, CA, 2006.
[6] Altera Corporation, “Avalon Bus Specification, Reference Manual,”
[Online Document], 2003 July, [Cited 2004 February 11], Available HTTP:
http://www.altera.com/literature/manual/mnl_avalon_bus.pdf
[7] Altera Nios II handbook
www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf
[8] Altera c2h compiler
www.altera.com/literature/ug/ug_nios2_c2h_compiler.pdf
[9] Accelerating Nios II with c2h compiler
www.altera.com/literature/tt/tt_nios2_c2h_accelerating_tutorial.pdf
[10] D. Etiemble, S. Bouaziz and L. Lacassagne, "Customizing
16-bit floating-point instructions on a NIOS II processor for
FPGA image and media processing", in IEEE Workshop on
Embedded Systems for Real Time Media Processing
(Estimedia), Jersey City, September 2005.
CONCLUSION
In this paper presents a The Fast Fourier Transform (FFT)
implementation, which was implemented using a Field
Programmable Gate Array (FPGA)-based Nios II soft-core
processor working in combination with custom hardware
accelerators generated through high-level synthesis. The
proposed system architecture, synthesized on Cyclone III
FPGA board, was developed through an iterative design space
exploration methodology using Altera’s C2H compiler. This
hardware and DMA approach simply reduced clock cycle
compare with software algorithm so we will get design
efficiency as well as maximum throughput and it also reduces
latency period.
ACKNOWLEDMENT
I wish to acknowledge all my colleagues from Electronics
& Telecommunication Department and other contributors.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 414
Download