An FPGA Application Development Platform Alan Concannon B.E Electronic & Computer Engineering Project Report EE426 Supervisor: Dr. Fearghal Morgan March 2007 i I hereby declare that this thesis is my original work except where stated Signature__________________________ Date: _____________________ ii Acknowledgements _____________________________________________________________________ I, Alan Concannon, would like to thank my project supervisor Dr Fearghal Morgan for his patience and supervision throughout the year. I would also like to Mr Martin Burke, Mr Myles Meehan and Mr Shaun Porter for their kind help throughout the year. iii Abstract _____________________________________________________________________ This project aims to build on previous work done by final year students to perform further development on an existing Digilent Spartan-3 FPGA based application development platform to perform a range of DSP, image and data processing functions. The existing project uses the Digilent D104 USB2 module, which connects to the Spartan-3 board to provide USB functionality and achieve high data transfer speeds. This project will use the Digilent Nexys board, which provides on-board USB functionality, as the platform to run the application. The Nexys board will allow the overall project to be more compact and robust. It is also the aim of this project to provide extra DSP functionality in the form of an FIR filter that will complement the existing range of DSP functions. This overall system will allow the user to open a GUI and display an image, which is stored on the PC, the user can then navigate the GUI in order to modify the image by selecting from a number of image processing functions. The end product will provide a neat and powerful application which will demonstrate the capabilities and advantages of implementing such design in FPGA logic and show the importance of FPGAs in the future of DSP design iv Table of Contents _____________________________________________________________________ Acknowledgements ......................................................................................................ii Abstract ........................................................................................................................iii Table of Contents ........................................................................................................ iv List of Figures .............................................................................................................. vi List of Tables ..............................................................................................................vii Chapter 1 Introduction................................................................................................ 1 1.0 Aim of Project ...................................................................................................... 1 1.0.1 Work Completed During Project ............................................................... 4 1.1 Project Methodology ............................................................................................ 5 1.1.1 Hardware ....................................................................................................... 5 1.1.1.1 Digilent Nexys Board [1] ......................................................................... 5 1.1.2 Software......................................................................................................... 6 [2] 1.1.2.1 Digilent Adept Suite ............................................................................ 6 1.1.2.2 Xilinx ISE (Integrated Software Environment) 8.2.03i .......................... 8 1.1.2.3 Modelsim: Xilinx Edition 1.1.2.4 Matlab [4] 1.1.2.5 Simulink [3] ................................................................. 10 ............................................................................................... 10 ® [5] ......................................................................................... 11 1.1.2.6 System Generator for DSP [6] ............................................................... 11 1.1.3 Tools ............................................................................................................ 11 1.1.3.1 Very High-Speed Integrated Circuit (VHSIC) Hardware .................... 11 Chapter 2 Memory Controller .................................................................................. 12 2.0 Nexys Cellular™ RAM...................................................................................... 12 2.1 RAM Bus Functional Model .............................................................................. 15 2.1.1 Introduction ................................................................................................. 15 2.1.2 Functional Description ................................................................................ 15 v 2.2 Memory Controller............................................................................................. 17 2.2.1 Overview ..................................................................................................... 17 2.2.2 memCtrlr ..................................................................................................... 17 2.2.3 memCtrlrUnit .............................................................................................. 18 2.2.4 memCtrlrUnitAndRamBfm ......................................................................... 23 2.2.4.1 Waveform Analysis .............................................................................. 23 2.2.5 SDRAM Access .......................................................................................... 25 Chapter 3 DSP Fundamentals .................................................................................. 27 3.0 Introduction ........................................................................................................ 27 3.1 Sampling ............................................................................................................ 27 3.2 Digital Filters ..................................................................................................... 28 3.2.1 Digital Filter Impulse Response .................................................................. 29 3.2.2 Choosing a Filter ......................................................................................... 30 3.3 Finite Impulse Response (FIR) Filter ................................................................. 31 3.3.1 Linear Phase ................................................................................................ 33 3.3.2 Methods of Designing FIR Filters ............................................................... 33 Chapter 4 FIR Filter Implementation ...................................................................... 36 4.0 Introduction ........................................................................................................ 36 4.1 FIR Filter Structure ............................................................................................ 36 4.2 System Generator Design Flow ......................................................................... 37 4.2.2 Generating HDL Code................................................................................. 41 Chapter 5 Conclusion and Future Work ................................................................. 42 5.0 Introduction ........................................................................................................ 42 5.1 Future Work ....................................................................................................... 42 Appendix A Code ....................................................................................................... 43 References ................................................................................................................... 44 vi List of Figures _____________________________________________________________________ Figure 1-1 System Overview ......................................................................................... 2 Figure 1-2 DSP Block within FPGA ............................................................................. 3 Figure 1-3 Digilent Nexys System Board ...................................................................... 6 Figure 1-4 Digilent Adept Suite USB Administrator .................................................... 7 Figure 1-5 Digilent Adept suite Export Software .......................................................... 8 Figure 2-1 Micron CellularRAM Functional Block Diagram ..................................... 13 Figure 2-2 Asynchronous RAM Read Timing Diagram .............................................. 14 Figure 2-3 Asynchronous RAM Write Timing Diagram ............................................. 14 Figure 2-4 RAMBfm Top Level Picture ...................................................................... 16 Figure 2-5 memCtrlrUnit Functional Partition ............................................................ 20 Figure 2-6 memCtrlrFSM Process Flowchart .............................................................. 21 Figure 2-7 memCtrlr Functional Partition ................................................................... 22 Figure 2-8 memCtrlrUnitAndRAMBfm Simulation Waveform ................................. 24 Figure 2-9 SDRAM Access Control Byte ................................................................... 25 Figure 3-1 Block diagram of DSP system [8] ............................................................... 27 Figure 3-2 Block diagram of a Simple Digital Filter ................................................... 28 Figure 3-3 Choosing a Filter Flowchart ....................................................................... 30 Figure 3-4 FIR Filter Park-McClellan Vs Hamming Window .................................... 35 Figure 4-1 FIR filter Structure ..................................................................................... 36 Figure 4-2 Simulink® Libraries - Xilinx Blockset View ............................................. 38 Figure 4-3 An 8 -Tap FIR Filter Design using Simulink® ........................................... 39 Figure 4-4 FIR Filter Response using Wave Scope ..................................................... 40 Figure 4-5 System Generator Properties Editor ........................................................... 41 vii List of Tables _____________________________________________________________________ Table 2-1 Asynchronous Mode Signal Description ..................................................... 13 Table 2-1 RAM BFM Timing Signal Descriptions ..................................................... 16 Table 2-2 memCtrlr Signal Changes............................................................................ 18 Table 2-3 memCtrlrUnit FSM Incremental Data Dictionary ....................................... 18 Table 2-4 SDRAM Access........................................................................................... 26 1 Chapter 1 Introduction _____________________________________________________________________ 1.0 Aim of Project The main objective of this project is to further develop an existing Spartan 3 FPGA (Field Programmable Gate Array) embedded system. The existing design allows the user to interact with a GUI (Graphical User Interface) that can display image data from a PC and allow the user to carry out a number of image processing functions on these images. The existing embedded system uses the Digilent D104 USB2 (Universal Series Bus) module to improve data transfer speeds, which connects to the Spartan 3 board via an external connector but this design, although very efficient is not very practical as it is easily damaged due to the D104 module hanging on the edge of the Spartan 3. In this project the Digilent Nexys board will be used which introduces on board USB functionality and a more robust and compact solution. This project has the following features: Select image data that is stored on the host PC using a VB (Visual Basic) GUI in order to allow image processing. Transfer of image data to and from the host PC via USB. Save image to the Digilent Nexys on-board Cellular Ram to allow DSP (Digital Signal Processing) to be carried out. Added DSP functionality in the form of an FIR (Finite Impulse response) filter that will allow filtering of image data. Perform a number of image processing functions on the data stored in the Nexys Cellular RAM and display the result to the user via the GUI. The completed project could be used as an improved alternative to the project studied in Semester 1 by 4th year Electronic\Electronic & Computer Engineering students i.e: 4th Year Digital Design and VHDL Course provided by Dr. Fearghal Morgan which uses the Spartan3 board and the same basic layout as this project but without the USB functionality and the extra DSP functionality. This system may then be used in the future to demonstrate the capabilities of the Digilent Nexys board and the overall ability of FPGAs with USB capabilities to carry out high level and meaningful tasks efficiently. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 2 In order for any person to implement this project it is strongly recommended that they first complete the appliedVHDL semester 1 project provided by Dr. Fearghal Morgan and familiarizes themselves with both Shane Agnew’s FYP (Real-Time Image Warper using Digilent Spartan 3 FPGA) and Antoin O’hAllmhurain’s FYP (DSP using Xilinx Spartam-3). Figure 1-1 below gives a graphical description of the overall system. Figure 1-1 System Overview _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 3 Figure 1-2 below gives a more in depth view of the DSP block. Once the existing project is fully ported over to the Nexys and the extra DSP functionality is added, the DSP block will look like Figure 1-2. Nexys FPGA DSP Block Delta Function Invert Image Flip Image Warp Image Morph Images(2) FIR Filter Image Figure 1-2 DSP Block within FPGA _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 4 1.0.1 Work Completed During Project The work completed during this project includes: The appliedVHDL course(EE427 Digital System Design & VHDL Semester 1 Course) that was introduced by Dr. Fearghal Morgan. . The appliedVHDL course provided the following: o UART - For communication with the Visual basic GUI which provides the user with an interface to carry out a Delta function o displayCtrlr – Displays the data as it is being transferred from the host to the board and vice versa o datCtrlr – Bundles and unbundles byte wide data into 32 bit word length and sends it to the UART o dspBlk – Performs a delta function that subtracts one image from another. o IOCSRBlk – Decodes UART control/data byte sequence, provides CSR read/write access, activates DSP task and activates datCtrlr module. o MemCtrlr – DSP and IO read/write access to memory. This appliedVHDL course demonstrated data, in the form of two bitmap images being received and transmitted from the host system via the uart. The data is then saved to the on-board SRAM and a DSP delta function carried out on both bitmap images. This course is the basis of the existing project. Review of Shane Agnew’s FYP (Real-Time Image Warper). Review of Antoin O’hAllmhurain’s FYP (DSP using the Xilinx Spartan 3). Redesigned the Memory controller and RAM Bus Functional Model. Porting of the existing project onto the Nexys FPGA – writing to and reading from memory. Writing to CSRs and reading from CSRs. Review and completion of the DSP Primer workbook and class notes provided by Bob Stewart, University of Strathclyde, Scotland, UK. FIR filter designed. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 5 1.1 Project Methodology 1.1.1 Hardware 1.1.1.1 Digilent Nexys Board [1] The Digilent Nexys circuit board is the main hardware component that is used in this project which is an integrated circuit development platform based on the Xilinx Spartan 3 FPGA. The Nexys board provides a number of useful I/O devices and numerous ports that make it an ideal platform for experiments with FPGA based digital systems. The Nexys board is suitable for a range of designs from low-level logic circuits to high-level digital systems and is fully compatible with all versions of the Xilinx ISE tools. The Nexys board includes the following features: A USB2 port for FPGA configuration and high speed data transfers Multiple power configurations (USB, batteries or wall plug) 16MB of Micron PSDRAM and 4MB of Intel StrataFlash Flash ROM 50MHz Oscillator Xilinx Platform Flash ROM for long term storage of FPGA configurations Connector for VGA (Video Graphics Array) hi-resolution graphics LCD (Liquid Crystal Display) panel or 16x2 character LCD display. 8 LEDS (Light Emitting Diodes), 4 seven segment displays, 4 pushbuttons and 8 slide switches 60 FPGA I/Os routed to on board expansion connectors 1,000,000 gate Xilinx XC3S1000 FPGA with 500+MHz operation In this project many features of the Nexys board are used. The USB port is used for FPGA configuration and data transfer to and from the host computer. The Micron PSDRAM is used to store the data that will have the DSP functions performed on it. Although there are three system clock settings on the Nexys, this project will use the 50Mhz clock setting. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 6 Figure 1-3 Digilent Nexys System Board 1.1.2 Software [2] 1.1.2.1 Digilent Adept Suite This software is used to provide a user with a platform to program the FPGA that is situated on the Nexys system board. The Digilent Adept Suite (DAS) allows JTAG (Joint Test Action Group) configuration of Xilinx logic devices that have Ethernet and USB capabilities. The Digilent Adept Suite consists of four pieces of software: Export Transport Export USB Administrator _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 7 This Project utilises both the USB Administrator and Export of the Adept Suite. USB Administrator Digilent USB Administrator (DUA) is used to configure the ID string contained in the firmware of a USB device. Each Digilent USB device has a specific ID string that the DUA displays once the device is connected. The Adept suite keeps track of communication modules in a list called the device table. A device must be added to this list before any of the other Adept software tools can be used with it. Once the Nexys is connected to the USB port of the host PC and the USB Administrator program is opened, the device ID and Serial Number are displayed as shown in Figure 1-4 below. Figure 1-4 Digilent Adept Suite USB Administrator Export This tool is used to program Xilinx FPGAs, PROMs (Programmable read Only Memories) and CPLDs (Complex Programmable Logic Devices) by accessing the JTAG scan chain. Export supports two configuration file types .BIT files and SVF (Serial Vector Format) that are both used to program JTAG devices. SVF files are used to program any JTAG devices whereas .bit files are used to program FPGA devices. Once the Export software tool has been opened and the Nexys board is powered on the user can initialise the scan chain. The FPGA and the ROM device will both be visible in the window; in this case the devices to be programmed are the XC3S1000 FPGA and the XCF04S ROM. The appropriate configuration bits can now be assigned to each device and the chain can be programmed. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 8 Figure 1-5 gives a graphical description of the Export software during the programming stage Figure 1-5 Digilent Adept suite Export Software 1.1.2.2 Xilinx ISE (Integrated Software Environment) 8.2.03i The Xilinx Integrated Software Environment (ISE) is a software suite that allows a user to take a design from the design entry stage through to the Xilinx device programming stage. The ISE Project Navigator provides management and processing of the design by implementing the following steps in the ISE design flow design through the following steps in the ISE design flow. This project uses the ISE project navigator version 8.2.03i to implement the design. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 9 Design Entry This is the very first step in the ISE design process. During this stage the user creates the project source files based on the design objectives. Programming languages can be used to create the top-level design file i.e Hardware Description Languages (HDL) such as Verilog, ABEL or VHDL, alternatively, a schematic may be used. A number of formats may be used to create the lower level source files in the design. The designer may be working with a synthesized EDIF (Electronic data Interchange Format) file that has been generated by a third party design entry tool or an NGC/NGO file, if this is the case, design entry and synthesis can be skipped. Synthesis Once Design Entry has been completed the designer may run the synthesis tool. During this stage the language design used e.g. VHDL or Verilog is converted into netlist files that are used as inputs into the implementation process. Once this step is completed a synthesis report is generated which can be viewed. A Technology and Real-Time Logic (RTL) schematic is also created. The Syntax is checked and once verified, the next step may be implemented. Implementation Once the implementation tool has been run the logical design is converted into a physical format (e.g. Bit File) that may be downloaded to the specified target device. In this project the target device is the Spartan Nexys board. From within Project Navigator the implementation process may be run in one or Multiple steps depending on whether the designer is targeting a Field Programmable Gate Array (FPGA) or a Complex Programmable Logic Device (CPLD). Multiple reports are generated during this stage that may be useful to the designer. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 10 Verification Verification may be carried out at multiple stages in the design flow. Simulation software such as Modelsim can be used to test the timing and the functionality of the whole design or a particular part of the design. Simulation allows the designer to create and verify complex functions speedily. During simulation, the source code used e.g. VHDL or Verilog is interpreted into circuit functionality and logical results of the described HDL are displayed to determine the correct operation of the circuit. Device Configuration Once a programming file has been generated, the target device may be configured. A programming file generation report is also available after this stage is completed. During configuration, the Xilinx tool generates configuration files and programming files may be downloaded from a host computer to a Xilinx device. 1.1.2.3 Modelsim: Xilinx Edition [3] Simulation software such as the Modelsim version provided as part of the Xilinx ISE 8.2.03i allows testing of a design prior to downloading it to a target device. Simulation may be carried out at multiple stages of the design flow depending on the particular designers taste. There is a better chance of detecting errors and bugs early in the design if simulation is carried out sooner rather than later. During simulation a test bench is created which models the external signals of the “Unit Under Test” and stimulus is then applied to these signals. The designer may view a timing diagram of the overall process that shows the response of all inputted and outputted signals. 1.1.2.4 Matlab [4] MATLAB® is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. In this project, it is planned to design an FIR (Finite Impulse Response) filter using Matlab in conjunction with Simulink® and System Generator. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 11 1.1.2.5 Simulink ® [5] Simulink® is a software package used to simulate, model and analyse dynamic systems. It supports linear and nonlinear systems, modelled in continuous time, sampled time, or a hybrid of the two. Systems can also be multi-rate, i.e., have different parts that are sampled or updated at different rates. In this project a digital FIR filter is built using the blocksets provided by Simulink®. The Xilinx blockset contained in the Simulink® library allows the designed system to be adapted for implementation on an FPGA by executing the System Generator software from within the design. 1.1.2.6 System Generator for DSP [6] System Generator for DSP is an industry standard high level tool used for designing high performance DSP systems that use FPGAs. This tool allows the designer to develop highly parallel systems with the industry’s most advanced FPGAs, providing system modelling and automatic code generation from Simulink® and MATLAB (The MathWorks, Inc.). System Generator is a key component of the Xilinx XtremeDSP™ solution. 1.1.3 Tools 1.1.3.1 Very High-Speed Integrated Circuit (VHSIC) Hardware VHDL can be used to describe the concurrent and sequential behavior of a digital system at many levels of abstraction ranging from the algorithmic level to the gate level. VHDL is an IEEE standard. A VHDL file has a .vhd or .vhdl extension. A system may be completely designed in software, tested and validated before it is implemented in hardware. The design may be broken up into smaller parts and described using a Hardware Description Language. VHDL is a technology independent industry standard and is non-proprietary unlike schematic entry tools. VHDL design entry is faster than schematic entry. Behavioural VHDL enables test bench stimulus generation, complex bus functional model creation and file IO. The VHDL design database provides formal design documentation. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 12 Chapter 2 Memory Controller _____________________________________________________________________ 2.0 Nexys Cellular™ RAM The Digilent Nexys board used in this project contains a different memory device to the Spartan3 board, which is used in the appliedVHDL course. The Spartan3 contains two separate Asynchronous CMOS (Complementary Metal Oxide Semiconductor) 256k x 16 SRAM (Static Random Access Memory) chips with both devices sharing common SRAM signals. Each device has a separate chip select (CE#) and individual upper byte and lower byte controls to select the high or low byte in the 16-bit data word, UB and LB respectively. Refer to CMOS Static RAM IS61LV25616AL specification [7]. The Nexys utilises the Micron MT45W8MW16BGX CellularRAM memory device, which is a high-speed CMOS PSRAM device, developed for low power, portable applications. This particular device has a 128Mb DRAM (Dynamic Random Access Memory) core organised as 8Mb*16 bits and includes 3 modes of operation; Asynchronous mode, page mode and burst mode. This project utilises the Asynchronous mode of operation that the device defaults to on power-up. This mode uses the standard SRAM control bus (CE#, OE#, WE#, LB#/UB#). READ operations are initiated by bringing CE#, OE#, LB# and UB# low while keeping WE# high. Valid data will be driven out of the I/Os after the specified access time has elapsed. WRITE operations occur when CE#, WE#, LB#/UB# are driven low. During Asynchronous write operations, the OE# level is a “Don’t Care”, and WE# will override OE#, the CLK input must be held static LOW. See Figure 2-1 for a simplified view of the operation. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 13 Figure 2-1 Micron CellularRAM Functional Block Diagram The Micron CellularRam device contains the standard SRAM control bus signals and the following 4 signals; ADV#, CRE, WAIT and CLK. Table 3-2 provides a detailed description of each Asynchronous mode signal used. Signal Type Description A [22:0] Input CLK Input ADV Input CRE Input CE Input OE Input WE Input LB Input Addressed inputs during read\write operations Clock can be static low or high during asynchronous operations Address Valid: Can be held low during read\write operations Configuration Register Enable: When low, a read/write memory access Chip Enable: Memory device activated when low Output Enable: Enables output buffers when low Write Enable: When low allows write operation Lower Byte Enable DQ [7:0] UB Input Upper Byte Enable DQ [15:7] DQ Input/Output WAIT Output Data Input/Outputs Asserted and ignored during read/write operations Table 2-1 Asynchronous Mode Signal Descriptions _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 14 Figure 2-2 Asynchronous RAM Read Timing Diagram Figure 2-3 Asynchronous RAM Write Timing Diagram _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 15 2.1 RAM Bus Functional Model 2.1.1 Introduction The RAM Bus Functional Model used in this project is very similar to the model created by Dr. Fearghal Morgan in the 4th year appliedVHDL course. It provides a synthesizable 8,192k x 16 bit high speed Asynchronous SDRAM that interfaces with the system by connecting directly to the Memory Control Unit. The timings provided in this model are taken from the Micron CellularRam™ MT45W1MW16BBGB memory chip [7]. 2.1.2 Functional Description The Nexys RAM device is modelled using an array of 16-bit registers to store data in the way memory does. Separate read-from-memory and write-to-memory processes are used but a single process could also be used. The bi-directional data bus is controlled and accessed using a tri-state buffer. During simulation the amount of memory modelled is 16 x 16-bit locations as this is all that is needed to provide a picture of how the system is working. Before synthesis and implementation the amount of memory to be used can be set back to the full amount: 8,192x16-bit locations. This is achieved by setting the constant numWords in the RAMBfm.vhd file to 8192000. All SRAM control signals are low asserted. During the Write process the Bus Functional Model will have write access to the data bus only when the chip is selected i.e: CE (chip select) low asserted and WE (write enable) low asserted. OE (output enable) must be deasserted during a write cycle. Assertion of both OE and WE simultaneously is an illegal state. Assuming UB (upper byte) and LB (lower byte) signals are both asserted during a write cycle, the 16-bit content of the data bus is written to the memory location addressed by A after THWZE. If UB or LB are asserted separately then only the selected half of the 16-bit content on the data bus is written to the memory location addressed by A after THZWE. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 16 During the read process access is granted to the data bus once CE and OE are both asserted. WE must be deasserted. As with the write cycle, if UB (upper byte) and LB (lower byte) signals are both asserted during a read cycle, the 16-bit content in the memory location addressed by A is outputted to the data bus after Taa. If UB or LB is asserted separately then only the selected half of the 16-bit content located in the memory location addressed by A is outputted to the data bus after Taa. De-assertion of both UB and LB is an illegal state. Assuming this occurs the BFM flags the system and a warning message is displayed. The default BFM output to the data bus is high impedance. Signal Name Description Time THWZE WE low to High-Z Output 8ns Taa Address Access Time 70ns Table 2-1 RAM BFM Timing Signal Descriptions From Table 2-4 it can be seen that a major drawback of using this memory device in Asynchronous mode is the large address access time needed during a memory read cycle. This slows the system significantly compared with the Spartan 3, which has an address access time of 10ns. Using this device in Page mode or Burst mode would considerably improve address access times during memory reads. 8K x 16 SDRAM IO(15:0) A(22:0) OE WE CLK UB IC13 NEXYS FPGA ADV LB CE CRE sigWait RAMBfm Figure 2-4 RAMBfm Top Level Picture _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 17 2.2 Memory Controller 2.2.1 Overview This Memory Controller has been modified in order to accommodate the re-designed RAM Bus Functional Model that includes extra signals needed to drive the Nexys memory device. There is also a need to remove the redundant signals from the second RAM component used in the Spartan 3 design that are no longer in use as there is only one SDRAM device used in this project. Due to the fact that the Address Access Time Taa is 70ns during a read from memory and the system clock frequency is 20ns, three extra read states have been added to compensate for the slow access during read cycles. 2.2.2 memCtrlr This module is taken from the 4th Year appliedVHDL Course but some alterations have been made so that extra signals from the RAMBfm can be added and redundant signals removed. Table 2-2 shows the signals used in the original memCtrlr and the signals used in the new memCtrlr. Signals used in Original memCtrlr Signals used in New memCtrlr Dat2Ram(31:0) dat2Ram(15:0) ramAddSrc(17:0) ramAddSrc(22:0) dspDat2Ram(31:0) dspDat2Ram(15:0) IODat2Ram(31:0) IODat2Ram(15:0) IORamAdd (17:0) IORamAdd (22:0) dspRamAdd(17:0) dspRamAdd(22:0) RamIO (31:0) RamIO (15:0) RamAdd (17:0) RamAdd (22:0) datFromRam(31:0) datFromRam(15:0) CE1L, CE0L, UB1L, UB0L, LB1L, LB0L CEL, UBL, LBL WEL, OEL WEL, OEL sigWait ADV _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 18 CRE MemClk Table 2-2 memCtrlr Signal Changes 2.2.3 memCtrlrUnit This unit has been provided by Dr. Fearghal Morgan and was used by Shane Agnew in his FYP but it was not necessary to alter it. Some minor alterations to this design are required in the case of this project. Extra signals have been added that are explained in Table 2-3. DFD 1.0 memCtrlrUnit process description Memory Controller Unit Finite State Machine controls SDRAM read/write access via wel, cel, and oel. Enables tri-state buffer for RAM data bus (15:0) Registers data read from RAM RamAddSrc (22:0) connected straight through to ramAdd (22:0) Signal dspActive selects SDRAM address, data and read/write signals Incremental Data Dictionary for memCtrlrUnit (level 1.0) Signal Name Type memClk std_logic CRE std_logic ADV std_logic sigWait std_logic Description Static low or high during asynchronous read/write operations Configuration Register Enable – asserted low during asynchronous operations Address valid – Indicates a valid address is present on the inputs. Addresses can be latched on the rising edge of ADV during asynchronous read and write operations Asserted and ignored during asynchronous and page mode operations. High impedance when CE is high Table 2-3 memCtrlrUnit FSM Incremental Data Dictionary _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 19 The signals described in Table 2-3 are all low asserted in the memCtrtlUnit and only take effect when instantiated in the memCtrlr, to connect at the top level. Figure 2-5 shows the functional partition of the memCtrlrUnit that includes the Finite State Machine. When a memory read/ write occurs, the signal ramDone from the output of the FSM is asserted after four clock cycles. During a Ram read, the signal regDatFromRam is asserted after four clock cycles to compensate for the 70ns Address Access Time. During a Ram write the enRamWrTri signal is asserted four times, once after each clock cycle i.e: asserted once for each write state. See Figure 2-6 on page 21 for memCtrlrUnit FSM flowchart. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 20 FSM provides SDRAM device control Flowchart indicates FSM behaviour CS ramWr ramWr ramRd ramRd memCtrlrFSM 1.0 ceL ceL weL weL oeL oeL NS & Output Decode NS ramDone ramDone FSM Comb clk FSM Synch D(15:0) regdatFrom Ram enRamWrTr i Q(15:0) datFromRam(15:0) en clk rst 16-bit registered SDRAM data available to Other elements of the system rst ce ramIO(15:0) Dat2Ram(15:0) 16-bit data available to write to SDRAM Buf Bi-directional 16-bit RAM data to/from External SDRAM device Tri-state buffer allows connection to SDRAM data bus on writes and disconnection during reads asgnRamAdd ramAddSrc(22:0) ramAdd(22:0) Buf ramAddSrc bus passes straight to RAM as ramAdd(15:0). memCtrlrUnit assumes ramAdd is stable when Ram read or write is requested ubL Buf lbL Buf © Alan Concannon, 2007 DFD 1.0.1 memCtrlrUnit Functional Partition Figure 2-5 memCtrlrUnit Functional Partition SDRAM byte control signals Note: always asserted => 16-bit access 21 Flowchart Key : i/ps : clk, rst, ramWr, ramRd o/ps Signal default value ceL ‘1’ weL ‘1’ oeL ‘1’ ramDone ‘0’ enRamWrTri ‘0’ regDatFromRam ‘0’ Idle ramWr ramRD N N Y Y Write1 Read1 ceL ‘0’ weL ’0’ enRamWrTri ’1' ceL ‘0’ oeL ‘0’ Write2 Read2 ceL ‘0’ oeL ‘0’ ceL ‘0’ weL ’0’ enRamWrTri ’1' Write3 Read3 ceL ‘0’ oeL ‘0’ ceL ‘0’ weL ’0’ enRamWrTri ’1' Write4 Read4 ceL ‘0’ weL ‘0’ Ramdone=’1' regdatFromRam=’1' ceL ‘0’ weL ‘0’ ramDone ’1' enRamWrTri ’1' ©Alan Concannon 2007 memCtrlrUnitFSM (DFD 1.0.1) Flowchart Note: Flowchart signals are default value unless otherwise stated Figure 2-6 memCtrlrFSM Process Flowchart _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 22 memCtrlrUnit(1.0.1) IORamrd Wr ramWr ramWr ramIO(15:0) ramIO(15:0) dspRamwr Rd IORamrd ramADD(22:0) ramRd ramADD(22:0) ramRd dspRamrd datFromRam(15:0) selRamDat ramDone dat2Ram(15:0) dspDat2Ram(15:0) datFromRam(15:0) ramDone dat2Ram(15:0) ceL ceL ubL ubL lbL lbL weL weL oeL oeL ADV ADV CRE CRE IODat2Ram(15:0) selRamAdd IORamAdd(22:0) ramAddSrc(22:0) ramAddSrc(22:0) dspRamAdd(22:0) dspActive memClk memClk sigWait sigWait clk rst © Alan Concannon, 2007 DFD 1.0 memCtrlr Functional Partition Figure 2-7 memCtrlr Functional Partition 23 2.2.4 memCtrlrUnitAndRamBfm This model includes the synthesisable memCtrlrUnit component along with the 16-bit SDRAM Bus functional model (BFM). This is the top-level structure for the memCtrlr design. The memCtrlrUnit and the RAMBfm model are both instantiated at this level and all internal signals are mapped. A test-bench program has been written which applies stimuli to this model. To simulate the design, both the unit under test (UUT) and the stimulus provided by the test bench are needed. The input stimulus is 16 address values and data values inputted to the unit under test from the file memWrite.txt. These values are read back from the simulated memory and written to memRead.txt where the values can be checked. 2.2.4.1 Waveform Analysis Figure 2-8 on Page 24 shows the output of the memCtrlrUnitAndRAMBfm test bench program. As the waveform shows it takes 4 system clock cycles to execute a RAM read and the same amount for a RAM write i.e: 80ns. The initial write state for a RAM write is entered on the next rising edge of the system clock after the ramWr signal goes high. WeL and ceL are both asserted on the same clock edge and enRamWrTri is asserted. RamAdd is assumed to be stable. The ramDone signal is asserted on the 4th clock cycle for one clock cycle. The data is written to memory; ceL, weL and ramDone are all deasserted. During a Ram read, the first read state is entered on the rising clock edge after the ramRd signal is asserted. Both ceL and oeL are both asserted on the same clock edge. The tri-state buffer signal enRamWrTri is deasserted. RamAdd is assumed to be stable. Both ramDone and regDatFromRam are asserted on the 4th clock cycle and the data is now valid and read back from memory. Due to the large RAM access time (Taa = 70ns) during reads the system is significantly slower when in Asynchronous mode. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 24 Figure 2-8 memCtrlrUnitAndRAMBfm Simulation Waveform 25 2.2.5 SDRAM Access This section details the operations carried out in order to access the Nexys SDRAM device whilst showing the advantages of this design over the original appliedVHDL project design. During a memory read/write operation the SDRAM address that is to be accessed must first be defined in the Control Status Registers i.e CSR address (2:0). In the original appliedVHDL project the SRAM address had to be setup externally for each data word being sent, six bytes had to be decoded to set the address in the CSRs. To define the SRAM address in the CSRs, 3 control bytes and 3 data bytes have to be sent i.e: 3 CSR writes .For an SRAM write, 1 control byte must be sent with the 4 data bytes i.e: 5 bytes in total. Therefore, the total number of bytes to be sent for each SRAM access is 11 bytes. Unused 7 CSR Address (2:0) 6 5 4 DSP Task 3 RAM Task 2 CSR Task 1 RW 0 Figure 2-9 SDRAM Access Control Byte In the current design, the start and end address locations to be accessed in SDRAM are setup prior to the first data transfer. This allows the system to count up from the first address being accessed to the last. To define the SDRAM start address takes 3 CSR writes i.e: 3 control bytes and 3 data bytes. To define the SDRAM end address takes 2 CSR writes i.e: 2 control bytes and 2 data bytes – 4 bytes in total. For an SDRAM write, 1 control byte must be sent with the 4 data bytes i.e: 5 bytes in total. Therefore, the total number of bytes to be sent for an SDRAM access in the current design is 10 bytes initially for address setup and 5 bytes for a write operation. For each subsequent SDRAM write thereafter only 5 bytes must be sent i.e: 1 control byte and 4 data bytes. The process is similar for SDRAM reads. See Table 2-4 for a summary of the above description. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 26 SRAM Access (Original) Vs SDRAM Access (Current) # Bytes Transferred # SRAM Writes Original Design Current Design 1 11 15 2 22 20 3 33 25 4 44 30 5 55 35 6 66 40 7 77 45 8 88 50 9 99 55 10 110 60 N 11 x (N) 10 + 5(N) Table 2-4 SDRAM Access Table 2-4 shows the difference in the number of bytes that need to be transferred from the host for 10 writes to memory. The full extent of savings cannot be appreciated until a significant amount of data is to be transferred. For example, to transfer a 640 x 480 byte image i.e: 307200 bytes in total (76800 longwords), the original design will to transfer 844,800 bytes (11 x 76,800 longwords). The current design will transfer 384,010 bytes i.e: 10 + (5 x 76,800 longwords) for the same size image. Therefore the larger the image to transfer is, the more substantial the savings. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 27 Chapter 3 DSP Fundamentals _____________________________________________________________________ 3.0 Introduction Digital Signal Processing (DSP) is largely concerned with signal analysis and processing, system analysis and system design using digital techniques rather than traditional analogue techniques. Signals and systems are represented in their digital form meaning they can easily be manipulated using computer-based methods. DSP has major benefits in digital system design namely portability, reusability, superior performance and flexibility. As this project is concerned with implementing a digital FIR filter into the DSP block of the existing design, a number of important aspects of DSP will be discussed in this chapter. Figure 3-1 shows a block diagram of a typical DSP system. This project will be largely concerned with the DSP system part of the block diagram. Figure 3-1 Block diagram of DSP system [10] 3.1 Sampling Digital Signals are only strictly defined for specific instances of time and are equal to zero at all other times. The instances of time at which digital signals are specified are equal to integer multiples of the sampling period. Sampling is the process of breaking an analogue signal up into discrete components. The sampling frequency at which this occurs must be at least twice the value of the maximum analogue frequency in order to prevent aliasing or loss of content. The resulting sampled signal is still technically a voltage and this voltage value is usually converted into a binary number with a certain number of bits to produce a quantised version of the original sample. In order for the _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 28 sampled signal to be processed digitally, quantisation must first occur. Essentially what has actually happened to the original signal is it has been converted into a string of integer numbers that can be easily manipulated digitally. Choosing an adequate sampling frequency is an important part of the design process. The maximum analogue frequency is often called the “Nyquist Frequency” and the minimum sampling rate that is twice the Nyquist Frequency is called the “Nyquist Rate”. If the sampling frequency is less than the Nyquist Rate then there is ambiguity as to which frequencies the sampled signal actually contains. 3.2 Digital Filters In Digital Signal Processing, Digital Filters are often referred to as Linear Time Invariant (LTI) discrete-time systems and vice versa. Such Filters can be constructed from 3 fundamental mathematical operations. Addition (or subtraction) Multiplication (normally of a signal by a constant) Time Delay i.e: delaying a digital signal by one or more sample periods Figure 3-2 shows a graphical means of describing a digital filter whereby the behaviour of the filter is described by using the mathematical operations mentioned above. X(n) Y(n) Summation Time Delay (Z-1) T Multiply A AX(n-1) X(n-1) Figure 3-2 Block diagram of a Simple Digital Filter _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 29 Digital filters can be classified into two categories: Non-Recursive, where the output only depends on the current and previous inputs. e.g.: FIR filters Recursive, where the output not only depends on the current and previous inputs but also on the previous output meaning there is feedback from the output back to the input e.g.: IIR filters 3.2.1 Digital Filter Impulse Response The Impulse Response of a digital filter, h(n) is the response of the filter to an input consisting of the unit impulse function, δ(n). If the impulse response of a system is known, it is possible to calculate the system response for any input sequence x(n). By definition, the unit impulse is applied to a system at sample index n=0, therefore, it is expected that the impulse response is non-zero only for values of n greater than or equal to zero i.e h(n) is zero for n<0. This impulse response is said to be causal as is expected otherwise the system would be producing a response before an input has been applied. It is known from the time-invariance property of a Linear Time Invariant System that the response of a system to a delayed unit impulse δ(n-k) will be a delayed version of the unit impulse, i.e h(n-k). It is also known from the linearity property that the response of a system to a weighted sum of inputs will be a weighted sum of responses of the system to each of the individual inputs. Therefore, the response of a system to an arbitrary input x(n) can be written as follows: _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 30 3.2.2 Choosing a Filter All digital filters have their advantages and disadvantages, e.g.: narrow transition band, low ripple in the pass band, good all-round performance, overshoot and ringing in the step response etc. The following flow chart can be used as a guide when choosing the type of filter required for an application. FIR Filter Linear Phase Y N Narrow Transition Band N Ripple OK? Y Y High Order Butterworth N Low Order Butterworth Y Narrowest Possible Transition Region Elliptic N Inverse Chebyshev Ripple in PassBand N Y Chebyshev Ripple in StopBand N Y Multiband Filter Specs Elliptic N Y FIR Figure 3-3 Choosing a Filter Flowchart _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 31 3.3 Finite Impulse Response (FIR) Filter FIR filters are one of two primary types of filters used in DSP, the other type being Infinite Impulse Response Filters (IIR) filters. The impulse response of an FIR filter is "finite" because there is no feedback in the filter; if you put in an impulse (that is, a single "1" sample followed by many "0" samples), zeroes will eventually come out after the "1" sample has made its way in the delay line past all the coefficients. Compared to IIR filters, FIR filters offer the following advantages: They can easily be designed to be "Linear Phase" (see next section). Linear-Phase filters delay the input signal, but don’t distort its phase. They are simple to implement. On most DSP microprocessors, looping a single instruction can do the FIR calculation. FIR filters are suited to multi-rate applications. i.e: reducing the sampling rate (decimation) or increasing the sampling rate (interpolation), or both. Whether decimating or interpolating, the use of FIR filters allows some of the calculations to be omitted, thus providing an important computational efficiency. In contrast, if IIR filters are used, each output must be individually calculated, even if it that output will discarded (so the feedback will be incorporated into the filter). FIR filters have desirable numeric properties. In practice, all DSP filters must be implemented using "finite-precision" arithmetic, that is, a limited number of bits. The use of finite-precision arithmetic in IIR filters can cause significant problems due to the use of feedback, but FIR filters have no feedback, so they can usually be implemented using fewer bits, and the designer has fewer practical problems to solve related to non-ideal arithmetic. FIR filters can be implemented using fractional arithmetic. Unlike IIR filters, it is always possible to implement a FIR filter using coefficients with magnitude of less than 1.0. (The overall gain of the FIR filter can be adjusted at its output, if desired.) i.e: simplifies implementation. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 32 A disadvantage of using FIR filters is that they require more co-efficients than an IIR filter in order to implement the same frequency response, therefore needing more memory and more hardware resources to carry out mathematical operations. Some terms used in describing FIR filters are as follows : Impulse Response - The "impulse response" of an FIR filter is the set of FIR coefficients. Putting an impulse into a FIR filter which consists of a "1" sample followed by many "0" samples, the output of the filter will be the set of coefficients, as the 1 sample moves past each coefficient in turn to form the output. Tap - A FIR tap is a coefficient/delay pair. The number of FIR taps, often designated as "N" is an indication of ; the amount of memory required to implement the filter, the number of calculations required, and the amount of "filtering" the filter can do. More taps means more stop-band attenuation, less ripple, narrower filters, etc. Multiply-Accumulate (MAC) - In a FIR context, a MAC is the operation of multiplying a coefficient by the corresponding delayed data sample and accumulating the result. FIR filters usually require one MAC per tap. Most DSP microprocessors implement the MAC operation in a single instruction cycle. Transition Band - The band of frequencies between pass-band and stopband edges. The narrower the transition band, the more taps are required to implement the filter. Delay Line - The set of memory elements that implement the "Z-1" delay elements of the FIR calculation. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 33 3.3.1 Linear Phase When a linear phase filter is desired, an FIR filter is used. Linear phase refers to the fact that the phase response of the filter is a straight-line function of frequency. This means that the delay through the filter will be the same at all frequencies. As a result, the filter does not cause phase/delay distortion, which can be a major advantage over IIR or analogue filters in certain applications, for example, in digital modems. FIR filters are designed to have linear phase but this does not have to be the case. An FIR filter is linear phase if and only if its co-efficients are symmetrical around the centre co-efficient i.e: the first co-efficient is the same as the last; the second is the same as the second last etc. 3.3.2 Methods of Designing FIR Filters The three most popular FIR filter design methods are: Parks-McClellan: The Parks-McClellan method is probably the most widely used FIR filter design method. It is an iteration algorithm that accepts filter specifications in terms of pass-band and stop-band frequencies, pass-band ripple, and stop-band attenuation. The fact that all important filter parameters can be specified directly is what makes this method so popular. Windowing: In the windowing method (e.g.: Hamming Window Method), an initial impulse response is derived by taking the Inverse Discrete Fourier Transform (IDFT) of the desired frequency response. Then applying a data window to it refines the impulse response. Direct Calculation: The impulse responses of certain types of FIR filters (e.g. Raised Cosine and Windowed Sinc) can be calculated directly from formulas. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 34 There are many filter design programs available to those who want to design digital filters. FIR filter design programs come in three broad categories: Filter Design Applications Math Programs e.g.: Matlab Source code The example code below is written in Matlab and demonstrates the power of this high level language in digital filter design. The following code was written in conjunction with Dr. E. Jones EE409 Digital Signal Processing Course and is a useful aid in understanding how to design digital FIR filters using software tools. FIR Filter Design - Park-McClellan versus the Windowing Method Matlab code generated to implement both filters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Park McClellan versus Hamming Window method % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Sampling frequency fs=8000; %Cut-off frequency fc=2000; %transition band tb=300; %Design filter using window method with 51 samples b=fir1(51,fc/(fs/2)); %frequency axis calculation df=(fs/2)/512; fr_axis=df*(0:512-1); %Design filter using park Mc McClellan method(remez function) a = remez(50,[0 fc/(fs/2) (fc+tb)/(fs/2) 1],[1 1 0 0]) %Calculate frequency response of filters with decibel scale h1=20*log10(abs(freqz(a))); h2=20*log10(abs(freqz(b))); figure; hold on; _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 35 %plot both frequency responses on a db versus frequency scale plot(fr_axis,h1); plot(fr_axis,h2,'color','g'); t=TITLE(' Magnitude Response: Park-McClellan versus Hamming Window method'); hold off; grid on; Figure 3-4 FIR Filter Park-McClellan Vs Hamming Window _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 36 Chapter 4 FIR Filter Implementation _____________________________________________________________________ 4.0 Introduction In this section of the project the actual design of the FIR filter is demonstrated and the tools used to implement it in FPGA architecture are explained. A simple 8-tap FIR (Finite Impulse Response) filter is designed using Simulink® and Xilinx System Generator. 4.1 FIR Filter Structure The filter shown in Figure 4-1 is an N-tap FIR filter that contains N delay elements used to store the current data sample and the previous N-1 data samples. These N data samples are multiplied by N co-efficients and the N product terms are added to produce the filter output. A MAC (Multiply-Accumulate) unit is used to multiply the sample\co-efficient pairs and to accumulate the product pairs. In total, N MAC cycles are required for an N-tap filter Data In Z-1 Z-1 X0 W0 X Z-1 X1 W1 X Z-1 X2 W2 X XN-1 WN-1 X Z-1 = Delay + = Adder W = Filter Weights X = Multiply + Data Out Figure 4-1 FIR filter Structure _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 37 The tapped delay line in Figure 4-1 can be implemented in FPGA logic using flipflops. If the filter in Figure 4-1 is using 4-bit data then 4 flip-flops are required to delay/store one sample of the 4-bit data. The larger the number of bits, the larger amount of flip-flops required. This means more logic blocks will be used i.e: more area of the FPGA required. For an 8-tap filter the number of register stages required is 8, each stage being 8 bits wide. Therefore, for the register/memory portion of the design, 64 flip-flops are needed. At each clock cycle, each co-efficient is multiplied by the 8-bit value in the appropriate register. Co-efficients may be stored as constants. Selecting the co-efficients as “powers of two” simplifies multiplication by allowing a shift operation. However, in practice the operations on co-efficients may not be as straight forward. 4.2 System Generator Design Flow System Generator is a DSP design tool from Xilinx that allows the use of the MathWorks® model based design environment from Simulink® for FPGA designs. Previous experience with Xilinx FPGAs is not necessary when using System Generator. Designs are captured in the DSP friendly Simulink® modelling environment using Xilinx specific blocksets. All downstream FPGA implementation steps including synthesis and place & route are automatically performed to generate an FPGA programming file. Over ninety DSP building blocks are provided in the Xilinx DSP blockset that includes common DSP building blocks such as Adders, Multipliers and registers. Also included are a set of complex DSP building blocks such as forward error correction blocks, FFTs, filters and memories. These blocks leverage the Xilinx IP core generators to deliver optimized results for the selected device. Figure 4-2 shows the blocksets that are available to a DSP designer when designing a DSP system using Simulink®. Design elements from the Xilinx blockset are connected together in a Simulink® model. The system can then be simulated in the Simulink® environment. System Generator makes it easier to get from the design stage to actual hardware implementation by integrating Matlab Simulink®, Modelsim™ [3] and the Xilinx ISE project environment. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 38 Figure 4-2 Simulink® Libraries - Xilinx Blockset View 4.2.1 FIR Filter Generation System Generator includes a FIR compiler block that targets the dedicated DSP 48 hardware resources of the Virtex 4 and Virtex 5 devices to create highly optimized implementations that can run in excess of 500Mhz. Configuration options allow the generation of direct poly-phase decimation, poly-phase interpolation and oversampled implementations. Standard Matlab functions such as FIR2 or the MathWorks FDATool can be used to create co-efficients for the Xilinx FIR compiler. Figure 4-3 shows the 8-tap filter designed for this project using the Xilinx blockset in the Simulink® library. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 39 Wave System Generator WaveScope double In Discrete Impulse Scope Fix_2_0 Fix_2_0 z-1 z-1 Fix_2_0 z-1 Fix_2_0 Fix_2_0 z-1 Fix_2_0 z-1 Fix_2_0 z-1 Fix_2_0 z-1 Gateway In Delay Delay1 Delay2 a a z-0 (ab) Fix_8_0 -10 Constant1 Fix_8_0 Delay3 b a z-0 (ab) Fix_8_0 20 Mult Delay4 Constant2 Fix_8_0 a z-0 (ab) Fix_8_0 Mult1 Constant3 a Constant4 50 Mult3 Fix_8_0 20 b (ab) Fix_8_0 z-0 (ab) Fix_8_0 Fix_8_0 b Constant6 Constant5 a -0 z z-0 (ab) Fix_8_0 Fix_8_0 80 b Mult2 Delay6 a z-0 (ab) Fix_8_0 Fix_8_0 50 b b Delay5 -10 Mult5 Mult4 Fix_8_0 b Constant7 Mult6 a a a a a a+b a+b Fix_9_0 b Fix_9_0 a+b a+b a+b Fix_9_0 Fix_9_0 Fix_9_0 a b a+b b AddSub4 b b AddSub1 AddSub2 AddSub Figure 4-3 An 8 -Tap FIR Filter Design using Simulink® Out double Gateway Out AddSub3 b Fix_9_0 AddSub5 Scope 40 The Filter weights chosen in Figure 4-3 are not chosen to give a specific response for the FIR filter but rather to show something observable. Figure 4-3 allows the designer to visually relate the design to the FPGA implementation. The Gateway In/Out blocks are used as an interface between the Xilinx blocksets and other Simulink® blocksets. Gateway In/Out blocks define the boundary of the FPGA from the Simulink® simulation model. The Gateway In block will convert the floating point input into a fixed-point number. Both the discrete input and the scope are both Simulink® sources. The delays, constants, multipliers and adder/subs are all System Generator blocks that are realizable in hardware. Every System Generator diagram is required to have a least one System Generator token placed in the diagram. This block is not connected to anything but serves to drive the FPGA implementation process. System Generator will generate an error if this block is absent or incorrectly configured. Figure 4-4 shows the result of the MAC operations on the discrete impulse input. Figure 4-4 FIR Filter Response using Wave Scope _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 41 4.2.2 Generating HDL Code Once any design is completed, in the case of Figure 4-3, a simple 8-tap FIR filter, double-clicking the System Generator token within the design (Picture to the left) brings up the Properties editor and then pressing the “Generate” button can generate the hardware implementation files. Selecting the compilation target to be HDL Netlist instructs System Generator to generate an RTL code and then stop. Also an HDL test bench and script files may be created from the Simulink® simulation for Modelsim so that the designer can verify the design and compare it with the Simulink® simulation. Figure 4-5 System Generator Properties Editor _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 42 Chapter 5 Conclusion and Future Work _____________________________________________________________________ 5.0 Introduction During the course of working on this project I learned a substantial amount, not only in both hardware and software design but also in relation to project management. This Project has given me a better appreciation of how a design should be approached and how to cope with problems, implement solutions and keep to deadlines. Throughout the year much of the theory learned in both EE409 Digital Signal Processing Course and EE427 Digital System & VHDL Design has proved useful and it is encouraging to see some of the theory being applied to an actual design. Gaining some experience in writing and understanding VHDL code has been satisfying. The ability of FPGAs to implement complex DSP systems and deliver high performance shows that this is the direction that future DSP design is heading. The power and flexibility of System Generator, Simulink and the Xilinx ISE environment proves a powerful platform for any FPGA/DSP system designer to work from. 5.1 Future Work Fully integrating Shane Agnew’s final year project “Real-Time Image Warper” onto the Nexys board is the first step in further developing this project. Then, implementing extra DSP functionality in the form of applying multiple filters or edge detection to an image. Once completed, the final project could then be used as a learning aid for future final year students in conjunction with Dr. Fearghal Morgan’s EE427 Digital System Design & VHDL Course. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 43 Appendix A Code Due to the length of code written for use in this project, all code is provided on the accompanying CD at the rear of this report. _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 44 References _____________________________________________________________________ [1] Digilent Nexys Board user guide https://www.digilentinc.com/Data/Products/NEXYS/Nexys_rm.pdf [2] Digilent Adept Reference Manual http://www.digilentinc.com/Software/Adept.cfm?Nav1=Software&Nav2=Adept [3] Modelsim: Xilinx Edition http://www.xilinx.com/ise/verification/mxe_details.html [4] Matlab http://www.mathworks.com/products/matlab/ [5] Simulink http://www.mathworks.com/products/simulink/ [6] System Generator for Digital Signal Processing http://www.xilinx.com/ise/optional_prod/system_generator.htm [7] Spartan3 SRAM memory device www.issi.com/pdf/61LV25616AL.pdf [8] Nexys CellularRAM™ memory device www.issi.com/pdf/61LV25616AL.pdf [9] VHDL References 3rd Year Digital Systems II course notes, Dr. Fearghal Morgan http://www.ee.nuigalway.ie/subjects/ee316/ 4th Year Digital Systems Design & VHDL course website, Dr. Fearghal Morgan http://www.ee.nuigalway.ie/subjects/ee427 _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway 45 [10] EE409 Digital Signal Processing Course Notes, Dr. E. Jones http://www.nuigalway.i.e/subjects/EE409/notes [11] Micron RAM products http://www.micron.com/products/psram/ [12] DSP Primer Part1 and Part 2 presented by: Bob Stewart, University of Strathclyde, Scotland, UK Steve Alexander, University of Strathclyde, Scotland, UK Jamie Bowman, Steepest Ascent Ltd _____________________________________________________________________ Final Year Project Report Dept. of Electronic Engineering, NUI Galway