Final Report

advertisement
An FPGA Application Development Platform
Alan Concannon
B.E Electronic & Computer Engineering Project Report
EE426
Supervisor: Dr. Fearghal Morgan
March 2007
i
I hereby declare that this thesis is my original work except where stated
Signature__________________________
Date: _____________________
ii
Acknowledgements
_____________________________________________________________________
I, Alan Concannon, would like to thank my project supervisor Dr Fearghal Morgan
for his patience and supervision throughout the year.
I would also like to Mr Martin Burke, Mr Myles Meehan and Mr Shaun Porter for
their kind help throughout the year.
iii
Abstract
_____________________________________________________________________
This project aims to build on previous work done by final year students to perform
further development on an existing Digilent Spartan-3 FPGA based application
development platform to perform a range of DSP, image and data processing
functions. The existing project uses the Digilent D104 USB2 module, which connects
to the Spartan-3 board to provide USB functionality and achieve high data transfer
speeds. This project will use the Digilent Nexys board, which provides on-board USB
functionality, as the platform to run the application. The Nexys board will allow the
overall project to be more compact and robust. It is also the aim of this project to
provide extra DSP functionality in the form of an FIR filter that will complement the
existing range of DSP functions. This overall system will allow the user to open a
GUI and display an image, which is stored on the PC, the user can then navigate the
GUI in order to modify the image by selecting from a number of image processing
functions. The end product will provide a neat and powerful application which will
demonstrate the capabilities and advantages of implementing such design in FPGA
logic and show the importance of FPGAs in the future of DSP design
iv
Table of Contents
_____________________________________________________________________
Acknowledgements ......................................................................................................ii
Abstract ........................................................................................................................iii
Table of Contents ........................................................................................................ iv
List of Figures .............................................................................................................. vi
List of Tables ..............................................................................................................vii
Chapter 1 Introduction................................................................................................ 1
1.0 Aim of Project ...................................................................................................... 1
1.0.1
Work Completed During Project ............................................................... 4
1.1 Project Methodology ............................................................................................ 5
1.1.1 Hardware ....................................................................................................... 5
1.1.1.1 Digilent Nexys Board
[1]
......................................................................... 5
1.1.2 Software......................................................................................................... 6
[2]
1.1.2.1 Digilent Adept Suite ............................................................................ 6
1.1.2.2 Xilinx ISE (Integrated Software Environment) 8.2.03i .......................... 8
1.1.2.3 Modelsim: Xilinx Edition
1.1.2.4 Matlab
[4]
1.1.2.5 Simulink
[3]
................................................................. 10
............................................................................................... 10
® [5]
......................................................................................... 11
1.1.2.6 System Generator for DSP
[6]
............................................................... 11
1.1.3 Tools ............................................................................................................ 11
1.1.3.1 Very High-Speed Integrated Circuit (VHSIC) Hardware .................... 11
Chapter 2 Memory Controller .................................................................................. 12
2.0 Nexys Cellular™ RAM...................................................................................... 12
2.1 RAM Bus Functional Model .............................................................................. 15
2.1.1 Introduction ................................................................................................. 15
2.1.2 Functional Description ................................................................................ 15
v
2.2 Memory Controller............................................................................................. 17
2.2.1 Overview ..................................................................................................... 17
2.2.2 memCtrlr ..................................................................................................... 17
2.2.3 memCtrlrUnit .............................................................................................. 18
2.2.4 memCtrlrUnitAndRamBfm ......................................................................... 23
2.2.4.1 Waveform Analysis .............................................................................. 23
2.2.5 SDRAM Access .......................................................................................... 25
Chapter 3 DSP Fundamentals .................................................................................. 27
3.0 Introduction ........................................................................................................ 27
3.1 Sampling ............................................................................................................ 27
3.2 Digital Filters ..................................................................................................... 28
3.2.1 Digital Filter Impulse Response .................................................................. 29
3.2.2 Choosing a Filter ......................................................................................... 30
3.3 Finite Impulse Response (FIR) Filter ................................................................. 31
3.3.1 Linear Phase ................................................................................................ 33
3.3.2 Methods of Designing FIR Filters ............................................................... 33
Chapter 4 FIR Filter Implementation ...................................................................... 36
4.0 Introduction ........................................................................................................ 36
4.1 FIR Filter Structure ............................................................................................ 36
4.2 System Generator Design Flow ......................................................................... 37
4.2.2 Generating HDL Code................................................................................. 41
Chapter 5 Conclusion and Future Work ................................................................. 42
5.0 Introduction ........................................................................................................ 42
5.1 Future Work ....................................................................................................... 42
Appendix A Code ....................................................................................................... 43
References ................................................................................................................... 44
vi
List of Figures
_____________________________________________________________________
Figure 1-1 System Overview ......................................................................................... 2
Figure 1-2 DSP Block within FPGA ............................................................................. 3
Figure 1-3 Digilent Nexys System Board ...................................................................... 6
Figure 1-4 Digilent Adept Suite USB Administrator .................................................... 7
Figure 1-5 Digilent Adept suite Export Software .......................................................... 8
Figure 2-1 Micron CellularRAM Functional Block Diagram ..................................... 13
Figure 2-2 Asynchronous RAM Read Timing Diagram .............................................. 14
Figure 2-3 Asynchronous RAM Write Timing Diagram ............................................. 14
Figure 2-4 RAMBfm Top Level Picture ...................................................................... 16
Figure 2-5 memCtrlrUnit Functional Partition ............................................................ 20
Figure 2-6 memCtrlrFSM Process Flowchart .............................................................. 21
Figure 2-7 memCtrlr Functional Partition ................................................................... 22
Figure 2-8 memCtrlrUnitAndRAMBfm Simulation Waveform ................................. 24
Figure 2-9 SDRAM Access Control Byte ................................................................... 25
Figure 3-1 Block diagram of DSP system [8] ............................................................... 27
Figure 3-2 Block diagram of a Simple Digital Filter ................................................... 28
Figure 3-3 Choosing a Filter Flowchart ....................................................................... 30
Figure 3-4 FIR Filter Park-McClellan Vs Hamming Window .................................... 35
Figure 4-1 FIR filter Structure ..................................................................................... 36
Figure 4-2 Simulink® Libraries - Xilinx Blockset View ............................................. 38
Figure 4-3 An 8 -Tap FIR Filter Design using Simulink® ........................................... 39
Figure 4-4 FIR Filter Response using Wave Scope ..................................................... 40
Figure 4-5 System Generator Properties Editor ........................................................... 41
vii
List of Tables
_____________________________________________________________________
Table 2-1 Asynchronous Mode Signal Description ..................................................... 13
Table 2-1 RAM BFM Timing Signal Descriptions ..................................................... 16
Table 2-2 memCtrlr Signal Changes............................................................................ 18
Table 2-3 memCtrlrUnit FSM Incremental Data Dictionary ....................................... 18
Table 2-4 SDRAM Access........................................................................................... 26
1
Chapter 1 Introduction
_____________________________________________________________________
1.0 Aim of Project
The main objective of this project is to further develop an existing Spartan 3 FPGA
(Field Programmable Gate Array) embedded system. The existing design allows the
user to interact with a GUI (Graphical User Interface) that can display image data
from a PC and allow the user to carry out a number of image processing functions on
these images. The existing embedded system uses the Digilent D104 USB2 (Universal
Series Bus) module to improve data transfer speeds, which connects to the Spartan 3
board via an external connector but this design, although very efficient is not very
practical as it is easily damaged due to the D104 module hanging on the edge of the
Spartan 3. In this project the Digilent Nexys board will be used which introduces on
board USB functionality and a more robust and compact solution. This project has the
following features:

Select image data that is stored on the host PC using a VB (Visual Basic) GUI
in order to allow image processing.

Transfer of image data to and from the host PC via USB.

Save image to the Digilent Nexys on-board Cellular Ram to allow DSP
(Digital Signal Processing) to be carried out.

Added DSP functionality in the form of an FIR (Finite Impulse response)
filter that will allow filtering of image data.

Perform a number of image processing functions on the data stored in the
Nexys Cellular RAM and display the result to the user via the GUI.
The completed project could be used as an improved alternative to the project studied
in Semester 1 by 4th year Electronic\Electronic & Computer Engineering students i.e:
4th Year Digital Design and VHDL Course provided by Dr. Fearghal Morgan which
uses the Spartan3 board and the same basic layout as this project but without the USB
functionality and the extra DSP functionality. This system may then be used in the
future to demonstrate the capabilities of the Digilent Nexys board and the overall
ability of FPGAs with USB capabilities to carry out high level and meaningful tasks
efficiently.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
2
In order for any person to implement this project it is strongly recommended that they
first complete the appliedVHDL semester 1 project provided by Dr. Fearghal Morgan
and familiarizes themselves with both Shane Agnew’s FYP (Real-Time Image
Warper using Digilent Spartan 3 FPGA) and Antoin O’hAllmhurain’s FYP (DSP
using Xilinx Spartam-3).
Figure 1-1 below gives a graphical description of the overall system.
Figure 1-1 System Overview
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
3
Figure 1-2 below gives a more in depth view of the DSP block. Once the existing
project is fully ported over to the Nexys and the extra DSP functionality is added, the
DSP block will look like Figure 1-2.
Nexys FPGA
DSP Block
Delta Function
Invert Image
Flip Image
Warp Image
Morph Images(2)
FIR Filter Image
Figure 1-2 DSP Block within FPGA
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
4
1.0.1 Work Completed During Project
The work completed during this project includes:

The appliedVHDL course(EE427 Digital System Design & VHDL Semester 1
Course) that was introduced by Dr. Fearghal Morgan. . The appliedVHDL course
provided the following:
o UART - For communication with the Visual basic GUI which provides the
user with an interface to carry out a Delta function
o displayCtrlr – Displays the data as it is being transferred from the host to
the board and vice versa
o datCtrlr – Bundles and unbundles byte wide data into 32 bit word length
and sends it to the UART
o dspBlk – Performs a delta function that subtracts one image from another.
o IOCSRBlk – Decodes UART control/data byte sequence, provides CSR
read/write access, activates DSP task and activates datCtrlr module.
o MemCtrlr – DSP and IO read/write access to memory.
This appliedVHDL course demonstrated data, in the form of two bitmap images being
received and transmitted from the host system via the uart. The data is then saved to
the on-board SRAM and a DSP delta function carried out on both bitmap images.
This course is the basis of the existing project.

Review of Shane Agnew’s FYP (Real-Time Image Warper).

Review of Antoin O’hAllmhurain’s FYP (DSP using the Xilinx Spartan 3).

Redesigned the Memory controller and RAM Bus Functional Model.

Porting of the existing project onto the Nexys FPGA – writing to and reading from
memory. Writing to CSRs and reading from CSRs.

Review and completion of the DSP Primer workbook and class notes provided by
Bob Stewart, University of Strathclyde, Scotland, UK.

FIR filter designed.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
5
1.1 Project Methodology
1.1.1 Hardware
1.1.1.1 Digilent Nexys Board
[1]
The Digilent Nexys circuit board is the main hardware component that is used in this
project which is an integrated circuit development platform based on the Xilinx
Spartan 3 FPGA. The Nexys board provides a number of useful I/O devices and
numerous ports that make it an ideal platform for experiments with FPGA based
digital systems. The Nexys board is suitable for a range of designs from low-level
logic circuits to high-level digital systems and is fully compatible with all versions of
the Xilinx ISE tools. The Nexys board includes the following features:

A USB2 port for FPGA configuration and high speed data transfers

Multiple power configurations (USB, batteries or wall plug)

16MB of Micron PSDRAM and 4MB of Intel StrataFlash Flash ROM

50MHz Oscillator

Xilinx Platform Flash ROM for long term storage of FPGA configurations

Connector for VGA (Video Graphics Array) hi-resolution graphics LCD
(Liquid Crystal Display) panel or 16x2 character LCD display.

8 LEDS (Light Emitting Diodes), 4 seven segment displays, 4 pushbuttons and
8 slide switches

60 FPGA I/Os routed to on board expansion connectors

1,000,000 gate Xilinx XC3S1000 FPGA with 500+MHz operation
In this project many features of the Nexys board are used. The USB port is used for
FPGA configuration and data transfer to and from the host computer. The Micron
PSDRAM is used to store the data that will have the DSP functions performed on it.
Although there are three system clock settings on the Nexys, this project will use the
50Mhz clock setting.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
6
Figure 1-3 Digilent Nexys System Board
1.1.2 Software
[2]
1.1.2.1 Digilent Adept Suite
This software is used to provide a user with a platform to program the FPGA that is
situated on the Nexys system board. The Digilent Adept Suite (DAS) allows JTAG
(Joint Test Action Group) configuration of Xilinx logic devices that have Ethernet and
USB capabilities. The Digilent Adept Suite consists of four pieces of software:

Export

Transport

Export

USB Administrator
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
7
This Project utilises both the USB Administrator and Export of the Adept Suite.

USB Administrator
Digilent USB Administrator (DUA) is used to configure the ID string
contained in the firmware of a USB device. Each Digilent USB device has a
specific ID string that the DUA displays once the device is connected. The
Adept suite keeps track of communication modules in a list called the device
table. A device must be added to this list before any of the other Adept
software tools can be used with it. Once the Nexys is connected to the USB
port of the host PC and the USB Administrator program is opened, the device
ID and Serial Number are displayed as shown in Figure 1-4 below.
Figure 1-4 Digilent Adept Suite USB Administrator

Export
This tool is used to program Xilinx FPGAs, PROMs (Programmable read
Only Memories) and CPLDs (Complex Programmable Logic Devices) by
accessing the JTAG scan chain. Export supports two configuration file types
.BIT files and SVF (Serial Vector Format) that are both used to program
JTAG devices. SVF files are used to program any JTAG devices whereas .bit
files are used to program FPGA devices. Once the Export software tool has
been opened and the Nexys board is powered on the user can initialise the scan
chain. The FPGA and the ROM device will both be visible in the window; in
this case the devices to be programmed are the XC3S1000 FPGA and the
XCF04S ROM. The appropriate configuration bits can now be assigned to
each device and the chain can be programmed.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
8
Figure 1-5 gives a graphical description of the Export software during the
programming stage
Figure 1-5 Digilent Adept suite Export Software
1.1.2.2 Xilinx ISE (Integrated Software Environment) 8.2.03i
The Xilinx Integrated Software Environment (ISE) is a software suite that allows a
user to take a design from the design entry stage through to the Xilinx device
programming stage. The ISE Project Navigator provides management and processing
of the design by implementing the following steps in the ISE design flow design
through the following steps in the ISE design flow. This project uses the ISE project
navigator version 8.2.03i to implement the design.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
9

Design Entry
This is the very first step in the ISE design process. During this stage the user
creates the project source files based on the design objectives. Programming
languages can be used to create the top-level design file i.e Hardware
Description Languages (HDL) such as Verilog, ABEL or VHDL,
alternatively, a schematic may be used. A number of formats may be used to
create the lower level source files in the design. The designer may be working
with a synthesized EDIF (Electronic data Interchange Format) file that has
been generated by a third party design entry tool or an NGC/NGO file, if this
is the case, design entry and synthesis can be skipped.

Synthesis
Once Design Entry has been completed the designer may run the synthesis
tool. During this stage the language design used e.g. VHDL or Verilog is
converted into netlist files that are used as inputs into the implementation
process. Once this step is completed a synthesis report is generated which can
be viewed. A Technology and Real-Time Logic (RTL) schematic is also
created. The Syntax is checked and once verified, the next step may be
implemented.

Implementation
Once the implementation tool has been run the logical design is converted into
a physical format (e.g. Bit File) that may be downloaded to the specified target
device. In this project the target device is the Spartan Nexys board. From
within Project Navigator the implementation process may be run in one or
Multiple steps depending on whether the designer is targeting a Field
Programmable Gate Array (FPGA) or a Complex Programmable Logic Device
(CPLD). Multiple reports are generated during this stage that may be useful to
the designer.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
10

Verification
Verification may be carried out at multiple stages in the design flow.
Simulation software such as Modelsim can be used to test the timing and the
functionality of the whole design or a particular part of the design. Simulation
allows the designer to create and verify complex functions speedily. During
simulation, the source code used e.g. VHDL or Verilog is interpreted into
circuit functionality and logical results of the described HDL are displayed to
determine the correct operation of the circuit.

Device Configuration
Once a programming file has been generated, the target device may be
configured. A programming file generation report is also available after this
stage is completed. During configuration, the Xilinx tool generates
configuration files and programming files may be downloaded from a host
computer to a Xilinx device.
1.1.2.3 Modelsim: Xilinx Edition
[3]
Simulation software such as the Modelsim version provided as part of the Xilinx ISE
8.2.03i allows testing of a design prior to downloading it to a target device.
Simulation may be carried out at multiple stages of the design flow depending on the
particular designers taste. There is a better chance of detecting errors and bugs early in
the design if simulation is carried out sooner rather than later. During simulation a test
bench is created which models the external signals of the “Unit Under Test” and
stimulus is then applied to these signals. The designer may view a timing diagram of
the overall process that shows the response of all inputted and outputted signals.
1.1.2.4 Matlab
[4]
MATLAB® is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. In this
project, it is planned to design an FIR (Finite Impulse Response) filter using Matlab in
conjunction with Simulink® and System Generator.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
11
1.1.2.5 Simulink
® [5]
Simulink® is a software package used to simulate, model and analyse dynamic
systems. It supports linear and nonlinear systems, modelled in continuous time,
sampled time, or a hybrid of the two. Systems can also be multi-rate, i.e., have
different parts that are sampled or updated at different rates. In this project a digital
FIR filter is built using the blocksets provided by Simulink®. The Xilinx blockset
contained in the Simulink® library allows the designed system to be adapted for
implementation on an FPGA by executing the System Generator software from within
the design.
1.1.2.6 System Generator for DSP
[6]
System Generator for DSP is an industry standard high level tool used for designing
high performance DSP systems that use FPGAs. This tool allows the designer to
develop highly parallel systems with the industry’s most advanced FPGAs, providing
system modelling and automatic code generation from Simulink® and MATLAB (The
MathWorks, Inc.). System Generator is a key component of the Xilinx XtremeDSP™
solution.
1.1.3 Tools
1.1.3.1 Very High-Speed Integrated Circuit (VHSIC) Hardware
VHDL can be used to describe the concurrent and sequential behavior of a digital
system at many levels of abstraction ranging from the algorithmic level to the gate
level. VHDL is an IEEE standard. A VHDL file has a .vhd or .vhdl extension.
A system may be completely designed in software, tested and validated before it is
implemented in hardware. The design may be broken up into smaller parts and
described using a Hardware Description Language. VHDL is a technology
independent industry standard and is non-proprietary unlike schematic entry tools.
VHDL design entry is faster than schematic entry. Behavioural VHDL enables test
bench stimulus generation, complex bus functional model creation and file IO. The
VHDL design database provides formal design documentation.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
12
Chapter 2 Memory Controller
_____________________________________________________________________
2.0 Nexys Cellular™ RAM
The Digilent Nexys board used in this project contains a different memory
device to the Spartan3 board, which is used in the appliedVHDL course. The
Spartan3 contains two separate Asynchronous CMOS (Complementary Metal Oxide
Semiconductor) 256k x 16 SRAM (Static Random Access Memory) chips with both
devices sharing common SRAM signals. Each device has a separate chip select (CE#)
and individual upper byte and lower byte controls to select the high or low byte in the
16-bit data word, UB and LB respectively. Refer to CMOS Static RAM
IS61LV25616AL specification [7].
The Nexys utilises the Micron MT45W8MW16BGX CellularRAM memory device,
which is a high-speed CMOS PSRAM device, developed for low power, portable
applications. This particular device has a 128Mb DRAM (Dynamic Random Access
Memory) core organised as 8Mb*16 bits and includes 3 modes of operation;
Asynchronous mode, page mode and burst mode. This project utilises the
Asynchronous mode of operation that the device defaults to on power-up. This mode
uses the standard SRAM control bus (CE#, OE#, WE#, LB#/UB#). READ operations
are initiated by bringing CE#, OE#, LB# and UB# low while keeping WE# high.
Valid data will be driven out of the I/Os after the specified access time has elapsed.
WRITE operations occur when CE#, WE#, LB#/UB# are driven low. During
Asynchronous write operations, the OE# level is a “Don’t Care”, and WE# will
override OE#, the CLK input must be held static LOW.
See Figure 2-1 for a simplified view of the operation.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
13
Figure 2-1 Micron CellularRAM Functional Block Diagram
The Micron CellularRam device contains the standard SRAM control bus signals and
the following 4 signals; ADV#, CRE, WAIT and CLK. Table 3-2 provides a detailed
description of each Asynchronous mode signal used.
Signal
Type
Description
A [22:0]
Input
CLK
Input
ADV
Input
CRE
Input
CE
Input
OE
Input
WE
Input
LB
Input
Addressed inputs during read\write
operations
Clock can be static low or high during
asynchronous operations
Address Valid: Can be held low during
read\write operations
Configuration Register Enable: When
low, a read/write memory access
Chip Enable: Memory device activated
when low
Output Enable: Enables output buffers
when low
Write Enable: When low allows write
operation
Lower Byte Enable DQ [7:0]
UB
Input
Upper Byte Enable DQ [15:7]
DQ
Input/Output
WAIT
Output
Data Input/Outputs
Asserted and ignored during read/write
operations
Table 2-1 Asynchronous Mode Signal Descriptions
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
14
Figure 2-2 Asynchronous RAM Read Timing Diagram
Figure 2-3 Asynchronous RAM Write Timing Diagram
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
15
2.1 RAM Bus Functional Model
2.1.1 Introduction
The RAM Bus Functional Model used in this project is very similar to the model
created by Dr. Fearghal Morgan in the 4th year appliedVHDL course. It provides a
synthesizable 8,192k x 16 bit high speed Asynchronous SDRAM that interfaces with
the system by connecting directly to the Memory Control Unit. The timings provided
in this model are taken from the Micron CellularRam™ MT45W1MW16BBGB memory
chip [7].
2.1.2 Functional Description
The Nexys RAM device is modelled using an array of 16-bit registers to store data in
the way memory does. Separate read-from-memory and write-to-memory processes
are used but a single process could also be used. The bi-directional data bus is
controlled and accessed using a tri-state buffer. During simulation the amount of
memory modelled is 16 x 16-bit locations as this is all that is needed to provide a
picture of how the system is working. Before synthesis and implementation the
amount of memory to be used can be set back to the full amount: 8,192x16-bit
locations. This is achieved by setting the constant numWords in the RAMBfm.vhd file
to 8192000.
All SRAM control signals are low asserted. During the Write process the Bus
Functional Model will have write access to the data bus only when the chip is selected
i.e: CE (chip select) low asserted and WE (write enable) low asserted. OE (output
enable) must be deasserted during a write cycle. Assertion of both OE and WE
simultaneously is an illegal state. Assuming UB (upper byte) and LB (lower byte)
signals are both asserted during a write cycle, the 16-bit content of the data bus is
written to the memory location addressed by A after THWZE. If UB or LB are asserted
separately then only the selected half of the 16-bit content on the data bus is written to
the memory location addressed by A after THZWE.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
16
During the read process access is granted to the data bus once CE and OE are both
asserted. WE must be deasserted. As with the write cycle, if UB (upper byte) and LB
(lower byte) signals are both asserted during a read cycle, the 16-bit content in the
memory location addressed by A is outputted to the data bus after Taa. If UB or LB is
asserted separately then only the selected half of the 16-bit content located in the
memory location addressed by A is outputted to the data bus after Taa.
De-assertion of both UB and LB is an illegal state. Assuming this occurs the BFM
flags the system and a warning message is displayed. The default BFM output to the
data bus is high impedance.
Signal Name
Description
Time
THWZE
WE low to High-Z Output
8ns
Taa
Address Access Time
70ns
Table 2-1 RAM BFM Timing Signal Descriptions
From Table 2-4 it can be seen that a major drawback of using this memory device in
Asynchronous mode is the large address access time needed during a memory read
cycle. This slows the system significantly compared with the Spartan 3, which has an
address access time of 10ns. Using this device in Page mode or Burst mode would
considerably improve address access times during memory reads.
8K x 16 SDRAM
IO(15:0)
A(22:0)
OE
WE
CLK
UB
IC13
NEXYS
FPGA
ADV
LB
CE
CRE
sigWait
RAMBfm
Figure 2-4 RAMBfm Top Level Picture
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
17
2.2 Memory Controller
2.2.1 Overview
This Memory Controller has been modified in order to accommodate the re-designed
RAM Bus Functional Model that includes extra signals needed to drive the Nexys
memory device. There is also a need to remove the redundant signals from the second
RAM component used in the Spartan 3 design that are no longer in use as there is only
one SDRAM device used in this project. Due to the fact that the Address Access Time
Taa is 70ns during a read from memory and the system clock frequency is 20ns, three
extra read states have been added to compensate for the slow access during read
cycles.
2.2.2 memCtrlr
This module is taken from the 4th Year appliedVHDL Course but some alterations
have been made so that extra signals from the RAMBfm can be added and redundant
signals removed. Table 2-2 shows the signals used in the original memCtrlr and the
signals used in the new memCtrlr.
Signals used in Original memCtrlr
Signals used in New memCtrlr
Dat2Ram(31:0)
dat2Ram(15:0)
ramAddSrc(17:0)
ramAddSrc(22:0)
dspDat2Ram(31:0)
dspDat2Ram(15:0)
IODat2Ram(31:0)
IODat2Ram(15:0)
IORamAdd (17:0)
IORamAdd (22:0)
dspRamAdd(17:0)
dspRamAdd(22:0)
RamIO (31:0)
RamIO (15:0)
RamAdd (17:0)
RamAdd (22:0)
datFromRam(31:0)
datFromRam(15:0)
CE1L, CE0L, UB1L, UB0L, LB1L, LB0L CEL, UBL, LBL
WEL, OEL
WEL, OEL
sigWait
ADV
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
18
CRE
MemClk
Table 2-2 memCtrlr Signal Changes
2.2.3 memCtrlrUnit
This unit has been provided by Dr. Fearghal Morgan and was used by Shane Agnew
in his FYP but it was not necessary to alter it. Some minor alterations to this design
are required in the case of this project. Extra signals have been added that are
explained in Table 2-3.
DFD 1.0 memCtrlrUnit process description

Memory Controller Unit Finite State Machine controls SDRAM read/write access
via wel, cel, and oel.

Enables tri-state buffer for RAM data bus (15:0)

Registers data read from RAM

RamAddSrc (22:0) connected straight through to ramAdd (22:0)

Signal dspActive selects SDRAM address, data and read/write signals
Incremental Data Dictionary for memCtrlrUnit (level 1.0)
Signal Name
Type
memClk
std_logic
CRE
std_logic
ADV
std_logic
sigWait
std_logic
Description
Static low or high during asynchronous read/write
operations
Configuration Register Enable – asserted low
during asynchronous operations
Address valid – Indicates a valid address is present
on the inputs. Addresses can be latched on the
rising edge of ADV during asynchronous read and
write operations
Asserted and ignored during asynchronous and
page mode operations. High impedance when CE
is high
Table 2-3 memCtrlrUnit FSM Incremental Data Dictionary
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
19
The signals described in Table 2-3 are all low asserted in the memCtrtlUnit and
only take effect when instantiated in the memCtrlr, to connect at the top level. Figure
2-5 shows the functional partition of the memCtrlrUnit that includes the Finite State
Machine. When a memory read/ write occurs, the signal ramDone from the output of
the FSM is asserted after four clock cycles. During a Ram read, the signal
regDatFromRam is asserted after four clock cycles to compensate for the 70ns
Address Access Time. During a Ram write the enRamWrTri signal is asserted four
times, once after each clock cycle i.e: asserted once for each write state.
See Figure 2-6 on page 21 for memCtrlrUnit FSM flowchart.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
20
FSM provides SDRAM device control
Flowchart indicates FSM behaviour
CS
ramWr
ramWr
ramRd
ramRd
memCtrlrFSM 1.0
ceL
ceL
weL
weL
oeL
oeL
NS & Output
Decode
NS
ramDone
ramDone
FSM
Comb
clk
FSM
Synch
D(15:0)
regdatFrom
Ram
enRamWrTr
i
Q(15:0)
datFromRam(15:0)
en
clk
rst
16-bit registered SDRAM data available to
Other elements of the system
rst
ce
ramIO(15:0)
Dat2Ram(15:0)
16-bit data available to write to
SDRAM
Buf
Bi-directional 16-bit RAM data to/from
External SDRAM device
Tri-state buffer allows connection to SDRAM data
bus on writes and disconnection during reads
asgnRamAdd
ramAddSrc(22:0)
ramAdd(22:0)
Buf
ramAddSrc bus passes straight to RAM as ramAdd(15:0).
memCtrlrUnit assumes ramAdd is stable when Ram read or write is requested
ubL
Buf
lbL
Buf
© Alan Concannon, 2007
DFD 1.0.1 memCtrlrUnit Functional Partition
Figure 2-5 memCtrlrUnit Functional Partition
SDRAM byte control signals
Note: always asserted => 16-bit
access
21
Flowchart Key :
i/ps :
clk, rst, ramWr, ramRd
o/ps
Signal default value
ceL
‘1’
weL
‘1’
oeL
‘1’
ramDone
‘0’
enRamWrTri
‘0’
regDatFromRam ‘0’
Idle
ramWr
ramRD
N
N
Y
Y
Write1
Read1
ceL ‘0’
weL ’0’
enRamWrTri ’1'
ceL ‘0’
oeL ‘0’
Write2
Read2
ceL ‘0’
oeL ‘0’
ceL ‘0’
weL ’0’
enRamWrTri ’1'
Write3
Read3
ceL ‘0’
oeL ‘0’
ceL ‘0’
weL ’0’
enRamWrTri ’1'
Write4
Read4
ceL ‘0’
weL ‘0’
Ramdone=’1'
regdatFromRam=’1'
ceL ‘0’
weL ‘0’
ramDone ’1'
enRamWrTri ’1'
©Alan Concannon 2007
memCtrlrUnitFSM (DFD 1.0.1) Flowchart
Note: Flowchart signals are default value unless otherwise stated
Figure 2-6 memCtrlrFSM Process Flowchart
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
22
memCtrlrUnit(1.0.1)
IORamrd
Wr
ramWr
ramWr
ramIO(15:0)
ramIO(15:0)
dspRamwr
Rd
IORamrd
ramADD(22:0)
ramRd
ramADD(22:0)
ramRd
dspRamrd
datFromRam(15:0)
selRamDat
ramDone
dat2Ram(15:0)
dspDat2Ram(15:0)
datFromRam(15:0)
ramDone
dat2Ram(15:0)
ceL
ceL
ubL
ubL
lbL
lbL
weL
weL
oeL
oeL
ADV
ADV
CRE
CRE
IODat2Ram(15:0)
selRamAdd
IORamAdd(22:0)
ramAddSrc(22:0)
ramAddSrc(22:0)
dspRamAdd(22:0)
dspActive
memClk
memClk
sigWait
sigWait
clk
rst
© Alan Concannon, 2007
DFD 1.0 memCtrlr Functional Partition
Figure 2-7 memCtrlr Functional Partition
23
2.2.4 memCtrlrUnitAndRamBfm
This model includes the synthesisable memCtrlrUnit component along with the 16-bit
SDRAM Bus functional model (BFM). This is the top-level structure for the
memCtrlr design. The memCtrlrUnit and the RAMBfm model are both instantiated at
this level and all internal signals are mapped. A test-bench program has been written
which applies stimuli to this model. To simulate the design, both the unit under test
(UUT) and the stimulus provided by the test bench are needed. The input stimulus is
16 address values and data values inputted to the unit under test from the file
memWrite.txt. These values are read back from the simulated memory and written to
memRead.txt where the values can be checked.
2.2.4.1 Waveform Analysis
Figure 2-8 on Page 24 shows the output of the memCtrlrUnitAndRAMBfm test bench
program. As the waveform shows it takes 4 system clock cycles to execute a RAM
read and the same amount for a RAM write i.e: 80ns. The initial write state for a
RAM write is entered on the next rising edge of the system clock after the ramWr
signal goes high. WeL and ceL are both asserted on the same clock edge and
enRamWrTri is asserted. RamAdd is assumed to be stable. The ramDone signal is
asserted on the 4th clock cycle for one clock cycle. The data is written to memory;
ceL, weL and ramDone are all deasserted.
During a Ram read, the first read state is entered on the rising clock edge after the
ramRd signal is asserted. Both ceL and oeL are both asserted on the same clock edge.
The tri-state buffer signal enRamWrTri is deasserted. RamAdd is assumed to be stable.
Both ramDone and regDatFromRam are asserted on the 4th clock cycle and the data is
now valid and read back from memory. Due to the large RAM access time
(Taa = 70ns) during reads the system is significantly slower when in Asynchronous
mode.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
24
Figure 2-8 memCtrlrUnitAndRAMBfm Simulation Waveform
25
2.2.5 SDRAM Access
This section details the operations carried out in order to access the Nexys SDRAM
device whilst showing the advantages of this design over the original appliedVHDL
project design. During a memory read/write operation the SDRAM address that is to
be accessed must first be defined in the Control Status Registers i.e CSR address
(2:0). In the original appliedVHDL project the SRAM address had to be setup
externally for each data word being sent, six bytes had to be decoded to set the
address in the CSRs. To define the SRAM address in the CSRs, 3 control bytes and 3
data bytes have to be sent i.e: 3 CSR writes .For an SRAM write, 1 control byte must
be sent with the 4 data bytes i.e: 5 bytes in total. Therefore, the total number of bytes
to be sent for each SRAM access is 11 bytes.
Unused
7
CSR Address (2:0)
6
5
4
DSP
Task
3
RAM
Task
2
CSR
Task
1
RW
0
Figure 2-9 SDRAM Access Control Byte
In the current design, the start and end address locations to be accessed in SDRAM
are setup prior to the first data transfer. This allows the system to count up from the
first address being accessed to the last. To define the SDRAM start address takes 3
CSR writes i.e: 3 control bytes and 3 data bytes. To define the SDRAM end address
takes 2 CSR writes i.e: 2 control bytes and 2 data bytes – 4 bytes in total. For an
SDRAM write, 1 control byte must be sent with the 4 data bytes i.e: 5 bytes in total.
Therefore, the total number of bytes to be sent for an SDRAM access in the current
design is 10 bytes initially for address setup and 5 bytes for a write operation. For
each subsequent SDRAM write thereafter only 5 bytes must be sent i.e: 1 control byte
and 4 data bytes. The process is similar for SDRAM reads. See Table 2-4 for a
summary of the above description.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
26
SRAM Access (Original) Vs SDRAM Access (Current)
# Bytes Transferred
# SRAM Writes
Original Design
Current Design
1
11
15
2
22
20
3
33
25
4
44
30
5
55
35
6
66
40
7
77
45
8
88
50
9
99
55
10
110
60
N
11 x (N)
10 + 5(N)
Table 2-4 SDRAM Access
Table 2-4 shows the difference in the number of bytes that need to be transferred from
the host for 10 writes to memory. The full extent of savings cannot be appreciated
until a significant amount of data is to be transferred. For example, to transfer a 640 x
480 byte image i.e: 307200 bytes in total (76800 longwords), the original design will
to transfer 844,800 bytes (11 x 76,800 longwords). The current design will transfer
384,010 bytes i.e: 10 + (5 x 76,800 longwords) for the same size image. Therefore the
larger the image to transfer is, the more substantial the savings.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
27
Chapter 3 DSP Fundamentals
_____________________________________________________________________
3.0 Introduction
Digital Signal Processing (DSP) is largely concerned with signal analysis and
processing, system analysis and system design using digital techniques rather than
traditional analogue techniques. Signals and systems are represented in their digital
form meaning they can easily be manipulated using computer-based methods. DSP
has major benefits in digital system design namely portability, reusability, superior
performance and flexibility. As this project is concerned with implementing a digital
FIR filter into the DSP block of the existing design, a number of important aspects of
DSP will be discussed in this chapter. Figure 3-1 shows a block diagram of a typical
DSP system. This project will be largely concerned with the DSP system part of the
block diagram.
Figure 3-1 Block diagram of DSP system [10]
3.1 Sampling
Digital Signals are only strictly defined for specific instances of time and are equal to
zero at all other times. The instances of time at which digital signals are specified are
equal to integer multiples of the sampling period. Sampling is the process of breaking
an analogue signal up into discrete components. The sampling frequency at which this
occurs must be at least twice the value of the maximum analogue frequency in order
to prevent aliasing or loss of content. The resulting sampled signal is still technically a
voltage and this voltage value is usually converted into a binary number with a certain
number of bits to produce a quantised version of the original sample. In order for the
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
28
sampled signal to be processed digitally, quantisation must first occur. Essentially
what has actually happened to the original signal is it has been converted into a string
of integer numbers that can be easily manipulated digitally. Choosing an adequate
sampling frequency is an important part of the design process. The maximum
analogue frequency is often called the “Nyquist Frequency” and the minimum
sampling rate that is twice the Nyquist Frequency is called the “Nyquist Rate”. If the
sampling frequency is less than the Nyquist Rate then there is ambiguity as to which
frequencies the sampled signal actually contains.
3.2 Digital Filters
In Digital Signal Processing, Digital Filters are often referred to as Linear Time
Invariant (LTI) discrete-time systems and vice versa. Such Filters can be constructed
from 3 fundamental mathematical operations.

Addition (or subtraction)

Multiplication (normally of a signal by a constant)

Time Delay i.e: delaying a digital signal by one or more sample periods
Figure 3-2 shows a graphical means of describing a digital filter whereby the
behaviour of the filter is described by using the mathematical operations mentioned
above.
X(n)
Y(n)
Summation
Time Delay (Z-1)
T
Multiply
A
AX(n-1)
X(n-1)
Figure 3-2 Block diagram of a Simple Digital Filter
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
29
Digital filters can be classified into two categories:

Non-Recursive, where the output only depends on the current and previous
inputs. e.g.: FIR filters

Recursive, where the output not only depends on the current and previous
inputs but also on the previous output meaning there is feedback from the
output back to the input e.g.: IIR filters
3.2.1 Digital Filter Impulse Response
The Impulse Response of a digital filter, h(n) is the response of the filter to an input
consisting of the unit impulse function, δ(n). If the impulse response of a system is
known, it is possible to calculate the system response for any input sequence x(n).
By definition, the unit impulse is applied to a system at sample index n=0, therefore, it
is expected that the impulse response is non-zero only for values of n greater than or
equal to zero i.e h(n) is zero for n<0. This impulse response is said to be causal as is
expected otherwise the system would be producing a response before an input has
been applied. It is known from the time-invariance property of a Linear Time
Invariant System that the response of a system to a delayed unit impulse δ(n-k) will be
a delayed version of the unit impulse, i.e h(n-k). It is also known from the linearity
property that the response of a system to a weighted sum of inputs will be a weighted
sum of responses of the system to each of the individual inputs. Therefore, the
response of a system to an arbitrary input x(n) can be written as follows:
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
30
3.2.2 Choosing a Filter
All digital filters have their advantages and disadvantages, e.g.: narrow transition
band, low ripple in the pass band, good all-round performance, overshoot and ringing
in the step response etc. The following flow chart can be used as a guide when
choosing the type of filter required for an application.
FIR Filter
Linear Phase
Y
N
Narrow
Transition
Band
N
Ripple OK?
Y
Y
High Order Butterworth
N
Low Order Butterworth
Y
Narrowest Possible Transition Region
Elliptic
N
Inverse Chebyshev
Ripple in
PassBand
N
Y
Chebyshev
Ripple in
StopBand
N
Y
Multiband
Filter
Specs
Elliptic
N
Y
FIR
Figure 3-3 Choosing a Filter Flowchart
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
31
3.3 Finite Impulse Response (FIR) Filter
FIR filters are one of two primary types of filters used in DSP, the other type being
Infinite Impulse Response Filters (IIR) filters. The impulse response of an FIR filter is
"finite" because there is no feedback in the filter; if you put in an impulse (that is, a
single "1" sample followed by many "0" samples), zeroes will eventually come out
after the "1" sample has made its way in the delay line past all the coefficients.
Compared to IIR filters, FIR filters offer the following advantages:

They can easily be designed to be "Linear Phase" (see next section).
Linear-Phase filters delay the input signal, but don’t distort its phase.

They are simple to implement. On most DSP microprocessors, looping a
single instruction can do the FIR calculation.

FIR filters are suited to multi-rate applications. i.e: reducing the sampling
rate (decimation) or increasing the sampling rate (interpolation), or both.
Whether decimating or interpolating, the use of FIR filters allows some of
the calculations to be omitted, thus providing an important computational
efficiency. In contrast, if IIR filters are used, each output must be
individually calculated, even if it that output will discarded (so the
feedback will be incorporated into the filter).

FIR filters have desirable numeric properties. In practice, all DSP filters
must be implemented using "finite-precision" arithmetic, that is, a limited
number of bits. The use of finite-precision arithmetic in IIR filters can
cause significant problems due to the use of feedback, but FIR filters have
no feedback, so they can usually be implemented using fewer bits, and the
designer has fewer practical problems to solve related to non-ideal
arithmetic.

FIR filters can be implemented using fractional arithmetic. Unlike IIR filters, it is
always possible to implement a FIR filter using coefficients with magnitude of
less than 1.0. (The overall gain of the FIR filter can be adjusted at its output, if
desired.) i.e: simplifies implementation.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
32
A disadvantage of using FIR filters is that they require more co-efficients than an IIR
filter in order to implement the same frequency response, therefore needing more
memory and more hardware resources to carry out mathematical operations.
Some terms used in describing FIR filters are as follows :

Impulse Response - The "impulse response" of an FIR filter is the set of
FIR coefficients. Putting an impulse into a FIR filter which consists of a
"1" sample followed by many "0" samples, the output of the filter will be
the set of coefficients, as the 1 sample moves past each coefficient in turn
to form the output.

Tap - A FIR tap is a coefficient/delay pair. The number of FIR taps, often
designated as "N" is an indication of ; the amount of memory required to
implement the filter, the number of calculations required, and the amount
of "filtering" the filter can do. More taps means more stop-band
attenuation, less ripple, narrower filters, etc.

Multiply-Accumulate (MAC) - In a FIR context, a MAC is the operation of
multiplying a coefficient by the corresponding delayed data sample and
accumulating the result. FIR filters usually require one MAC per tap. Most
DSP microprocessors implement the MAC operation in a single instruction
cycle.

Transition Band - The band of frequencies between pass-band and stopband edges. The narrower the transition band, the more taps are required to
implement the filter.

Delay Line - The set of memory elements that implement the "Z-1" delay
elements of the FIR calculation.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
33
3.3.1 Linear Phase
When a linear phase filter is desired, an FIR filter is used. Linear phase refers to the
fact that the phase response of the filter is a straight-line function of frequency. This
means that the delay through the filter will be the same at all frequencies. As a result,
the filter does not cause phase/delay distortion, which can be a major advantage over
IIR or analogue filters in certain applications, for example, in digital modems. FIR
filters are designed to have linear phase but this does not have to be the case. An FIR
filter is linear phase if and only if its co-efficients are symmetrical around the centre
co-efficient i.e: the first co-efficient is the same as the last; the second is the same as
the second last etc.
3.3.2 Methods of Designing FIR Filters
The three most popular FIR filter design methods are:

Parks-McClellan: The Parks-McClellan method is probably the most
widely used FIR filter design method. It is an iteration algorithm that
accepts filter specifications in terms of pass-band and stop-band
frequencies, pass-band ripple, and stop-band attenuation. The fact that all
important filter parameters can be specified directly is what makes this
method so popular.

Windowing: In the windowing method (e.g.: Hamming Window Method),
an initial impulse response is derived by taking the Inverse Discrete
Fourier Transform (IDFT) of the desired frequency response. Then
applying a data window to it refines the impulse response.

Direct Calculation: The impulse responses of certain types of FIR filters (e.g.
Raised Cosine and Windowed Sinc) can be calculated directly from formulas.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
34
There are many filter design programs available to those who want to
design digital filters. FIR filter design programs come in three broad
categories:

Filter Design Applications

Math Programs e.g.: Matlab

Source code
The example code below is written in Matlab and demonstrates the power of
this high level language in digital filter design. The following code was written
in conjunction with Dr. E. Jones EE409 Digital Signal Processing Course and
is a useful aid in understanding how to design digital FIR filters using
software tools.
FIR Filter Design - Park-McClellan versus the Windowing Method

Matlab code generated to implement both filters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Park McClellan versus Hamming Window method
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Sampling frequency
fs=8000;
%Cut-off frequency
fc=2000;
%transition band
tb=300;
%Design filter using window method with 51 samples
b=fir1(51,fc/(fs/2));
%frequency axis calculation
df=(fs/2)/512;
fr_axis=df*(0:512-1);
%Design filter using park Mc McClellan method(remez function)
a = remez(50,[0 fc/(fs/2) (fc+tb)/(fs/2) 1],[1 1 0 0])
%Calculate frequency response of filters with decibel scale
h1=20*log10(abs(freqz(a)));
h2=20*log10(abs(freqz(b)));
figure;
hold on;
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
35
%plot both frequency responses on a db versus frequency scale
plot(fr_axis,h1);
plot(fr_axis,h2,'color','g');
t=TITLE(' Magnitude Response: Park-McClellan versus Hamming Window
method');
hold off;
grid on;
Figure 3-4 FIR Filter Park-McClellan Vs Hamming Window
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
36
Chapter 4 FIR Filter Implementation
_____________________________________________________________________
4.0 Introduction
In this section of the project the actual design of the FIR filter is demonstrated and the
tools used to implement it in FPGA architecture are explained. A simple 8-tap FIR
(Finite Impulse Response) filter is designed using Simulink® and Xilinx System
Generator.
4.1 FIR Filter Structure
The filter shown in Figure 4-1 is an N-tap FIR filter that contains N delay elements
used to store the current data sample and the previous N-1 data samples. These N data
samples are multiplied by N co-efficients and the N product terms are added to
produce the filter output. A MAC (Multiply-Accumulate) unit is used to multiply the
sample\co-efficient pairs and to accumulate the product pairs. In total, N MAC cycles
are required for an N-tap filter
Data In
Z-1
Z-1
X0
W0
X
Z-1
X1
W1
X
Z-1
X2
W2
X
XN-1
WN-1
X
Z-1 = Delay
+ = Adder
W = Filter Weights
X = Multiply
+
Data Out
Figure 4-1 FIR filter Structure
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
37
The tapped delay line in Figure 4-1 can be implemented in FPGA logic using flipflops. If the filter in Figure 4-1 is using 4-bit data then 4 flip-flops are required to
delay/store one sample of the 4-bit data. The larger the number of bits, the larger
amount of flip-flops required. This means more logic blocks will be used i.e: more
area of the FPGA required. For an 8-tap filter the number of register stages required is
8, each stage being 8 bits wide. Therefore, for the register/memory portion of the
design, 64 flip-flops are needed. At each clock cycle, each co-efficient is multiplied
by the 8-bit value in the appropriate register. Co-efficients may be stored as constants.
Selecting the co-efficients as “powers of two” simplifies multiplication by allowing a
shift operation. However, in practice the operations on co-efficients may not be as
straight forward.
4.2 System Generator Design Flow
System Generator is a DSP design tool from Xilinx that allows the use of the
MathWorks® model based design environment from Simulink® for FPGA designs.
Previous experience with Xilinx FPGAs is not necessary when using System
Generator. Designs are captured in the DSP friendly Simulink® modelling
environment using Xilinx specific blocksets. All downstream FPGA implementation
steps including synthesis and place & route are automatically performed to generate
an FPGA programming file. Over ninety DSP building blocks are provided in the
Xilinx DSP blockset that includes common DSP building blocks such as Adders,
Multipliers and registers. Also included are a set of complex DSP building blocks
such as forward error correction blocks, FFTs, filters and memories. These blocks
leverage the Xilinx IP core generators to deliver optimized results for the selected
device. Figure 4-2 shows the blocksets that are available to a DSP designer when
designing a DSP system using Simulink®. Design elements from the Xilinx blockset
are connected together in a Simulink® model. The system can then be simulated in the
Simulink® environment. System Generator makes it easier to get from the design
stage to actual hardware implementation by integrating Matlab Simulink®,
Modelsim™ [3] and the Xilinx ISE project environment.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
38
Figure 4-2 Simulink® Libraries - Xilinx Blockset View
4.2.1 FIR Filter Generation
System Generator includes a FIR compiler block that targets the dedicated DSP 48
hardware resources of the Virtex 4 and Virtex 5 devices to create highly optimized
implementations that can run in excess of 500Mhz. Configuration options allow the
generation of direct poly-phase decimation, poly-phase interpolation and oversampled implementations. Standard Matlab functions such as FIR2 or the MathWorks
FDATool can be used to create co-efficients for the Xilinx FIR compiler. Figure 4-3
shows the 8-tap filter designed for this project using the Xilinx blockset in the
Simulink® library.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
39
Wave
System
Generator
WaveScope
double
In
Discrete
Impulse
Scope
Fix_2_0
Fix_2_0
z-1
z-1
Fix_2_0
z-1
Fix_2_0
Fix_2_0
z-1
Fix_2_0
z-1
Fix_2_0
z-1
Fix_2_0
z-1
Gateway In
Delay
Delay1
Delay2
a
a
z-0 (ab) Fix_8_0
-10
Constant1
Fix_8_0
Delay3
b
a
z-0 (ab) Fix_8_0
20
Mult
Delay4
Constant2
Fix_8_0
a
z-0 (ab) Fix_8_0
Mult1
Constant3
a
Constant4
50
Mult3
Fix_8_0
20
b
(ab)
Fix_8_0
z-0 (ab) Fix_8_0
Fix_8_0
b
Constant6
Constant5
a
-0
z
z-0 (ab) Fix_8_0
Fix_8_0
80
b
Mult2
Delay6
a
z-0 (ab) Fix_8_0
Fix_8_0
50
b
b
Delay5
-10
Mult5
Mult4
Fix_8_0
b
Constant7
Mult6
a
a
a
a
a
a+b
a+b
Fix_9_0
b
Fix_9_0
a+b
a+b
a+b
Fix_9_0
Fix_9_0
Fix_9_0
a
b
a+b
b
AddSub4
b
b
AddSub1
AddSub2
AddSub
Figure 4-3 An 8 -Tap FIR Filter Design using Simulink®
Out
double
Gateway Out
AddSub3
b
Fix_9_0
AddSub5
Scope
40
The Filter weights chosen in Figure 4-3 are not chosen to give a specific response for
the FIR filter but rather to show something observable. Figure 4-3 allows the designer
to visually relate the design to the FPGA implementation. The Gateway In/Out blocks
are used as an interface between the Xilinx blocksets and other Simulink® blocksets.
Gateway In/Out blocks define the boundary of the FPGA from the Simulink®
simulation model. The Gateway In block will convert the floating point input into a
fixed-point number. Both the discrete input and the scope are both Simulink® sources.
The delays, constants, multipliers and adder/subs are all System Generator blocks that
are realizable in hardware. Every System Generator diagram is required to have a
least one System Generator token placed in the diagram. This block is not connected
to anything but serves to drive the FPGA implementation process. System Generator
will generate an error if this block is absent or incorrectly configured.
Figure 4-4 shows the result of the MAC operations on the discrete impulse input.
Figure 4-4 FIR Filter Response using Wave Scope
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
41
4.2.2 Generating HDL Code
Once any design is completed, in the case of Figure 4-3, a simple 8-tap FIR filter,
double-clicking the System Generator token within the design (Picture to the
left) brings up the Properties editor and then pressing the “Generate” button
can generate the hardware implementation files. Selecting the compilation target to be
HDL Netlist instructs System Generator to generate an RTL code and then stop. Also
an HDL test bench and script files may be created from the Simulink® simulation for
Modelsim so that the designer can verify the design and compare it with the
Simulink® simulation.
Figure 4-5 System Generator Properties Editor
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
42
Chapter 5 Conclusion and Future Work
_____________________________________________________________________
5.0 Introduction
During the course of working on this project I learned a substantial amount, not only
in both hardware and software design but also in relation to project management. This
Project has given me a better appreciation of how a design should be approached and
how to cope with problems, implement solutions and keep to deadlines. Throughout
the year much of the theory learned in both EE409 Digital Signal Processing Course
and EE427 Digital System & VHDL Design has proved useful and it is encouraging
to see some of the theory being applied to an actual design. Gaining some experience
in writing and understanding VHDL code has been satisfying.
The ability of FPGAs to implement complex DSP systems and deliver high
performance shows that this is the direction that future DSP design is heading. The
power and flexibility of System Generator, Simulink and the Xilinx ISE environment
proves a powerful platform for any FPGA/DSP system designer to work from.
5.1 Future Work
Fully integrating Shane Agnew’s final year project “Real-Time Image Warper” onto
the Nexys board is the first step in further developing this project. Then,
implementing extra DSP functionality in the form of applying multiple filters or edge
detection to an image. Once completed, the final project could then be used as a
learning aid for future final year students in conjunction with Dr. Fearghal Morgan’s
EE427 Digital System Design & VHDL Course.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
43
Appendix A Code
Due to the length of code written for use in this project, all code is provided on the
accompanying CD at the rear of this report.
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
44
References
_____________________________________________________________________
[1] Digilent Nexys Board user guide
https://www.digilentinc.com/Data/Products/NEXYS/Nexys_rm.pdf
[2] Digilent Adept Reference Manual
http://www.digilentinc.com/Software/Adept.cfm?Nav1=Software&Nav2=Adept
[3] Modelsim: Xilinx Edition
http://www.xilinx.com/ise/verification/mxe_details.html
[4] Matlab
http://www.mathworks.com/products/matlab/
[5] Simulink
http://www.mathworks.com/products/simulink/
[6] System Generator for Digital Signal Processing
http://www.xilinx.com/ise/optional_prod/system_generator.htm
[7] Spartan3 SRAM memory device
www.issi.com/pdf/61LV25616AL.pdf
[8] Nexys CellularRAM™ memory device
www.issi.com/pdf/61LV25616AL.pdf
[9] VHDL References

3rd Year Digital Systems II course notes, Dr. Fearghal Morgan
http://www.ee.nuigalway.ie/subjects/ee316/

4th Year Digital Systems Design & VHDL course website, Dr. Fearghal Morgan
http://www.ee.nuigalway.ie/subjects/ee427
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
45
[10] EE409 Digital Signal Processing Course Notes, Dr. E. Jones
http://www.nuigalway.i.e/subjects/EE409/notes
[11] Micron RAM products
http://www.micron.com/products/psram/
[12] DSP Primer Part1 and Part 2 presented by:

Bob Stewart, University of Strathclyde, Scotland, UK

Steve Alexander, University of Strathclyde, Scotland, UK

Jamie Bowman, Steepest Ascent Ltd
_____________________________________________________________________
Final Year Project Report
Dept. of Electronic Engineering, NUI Galway
Download