Creating a 12 x 8 MAC Using VHDL and the Xilinx CORE Generator For Academic Use Only Creating a 12 x 8 MAC Using VHDL and the Xilinx CORE Generator Introduction In this lab, you will create a 12-bit x 8-bit MAC (Multiplier Accumulator) using a combination of VHDL and the Xilinx CORE Generator. You will create a multiplier unit in VHDL and an accumulator using Core Generator, and then connect them together in the top-level design. This lab helps familiarize you with the Xilinx CORE Generator and the Xilinx implementation tools by having you generate the accumulator as an IP core. This lab is completed using the Xilinx ISE 6 software. You will use a typical VHDL flow to black-box (instantiate) the core into a top-level piece of VHDL code, run a functional HDL simulation, synthesize your design with XST, and take the synthesized design through the Xilinx implementation tools. You will then verify the functionality of the design on-chip using Chipscope-Pro. Note: For this lab, you do not need to know VHDL because the top-level VHDL file is provided. There is a completed example in c:\xup\dsp_flow\labs\lab2\lab1_soln. Objectives After completing this lab, you will be able to: Generate a CORE Generator macro Simulate a piece of VHDL containing a CORE Generator macro Synthesize the VHDL and black-box instantiations using XST Implement a synthesized design through the Xilinx implementation tools Design Description Use a CORE Generator to create a 12 x 8 MAC using VHDL that has the following behavior: Multiplier input data widths of 12-bits and 8-bits of signed data Multiplier output width of 20 bits Accumulator output width of 27-bits Procedure This lab comprises nine primary steps: you will start the project navigator and open the project; create a 12x8 multiplier unit using VHDL; generate an accumulator core using CORE Generator; add the CORE Generator macro into the provided VHDL code; synthesize the design using XST; insert the ILA and ICON cores into the MAC design; implement the MAC design; use ChipscopePro Analyzer to configure the FPGA and specify match units and trigger conditions; and then perform an on-chip verification. Below each general instruction for a given procedure, you will Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-3 find accompanying step-by-step directions and illustrated figures providing more detail for performing the general instruction. If you feel confident about a specific instruction, feel free to skip the step-by-step directions and move on to the next general instruction in the procedure. Note: If you are unable to complete the lab at this time, you can download the lab files for this module from the Xilinx University Program site at http://university.xilinx.com Start the Project Navigator and Open the Project Step 1 Launch the ISE Project Navigator and open the mac_cgen project. Open the Xilinx ISE 6 software: Go to Start Menu Programs Xilinx ISE 6 Project Navigator Open the mac_cgen project: In the Project Navigator, select File Open Project Browse to c:\xup\dsp_flow\labs\lab1 using the pull-down arrow Open the mac_cgen folder and select the mac_cgen.npl project file Click OK Generate the VHDL Code for the Multiplier Step 2 Open the mac_cgen.vhd file and modify it to perform the 12 x 8 multiply operation. Refer to Figure 21-1 block diagram to understand the provided code. The comments in the code will guide you to complete this step. Spend 15 minutes working on your VHDL code, then move on and use the solution provided in lab1_soln directory. Open the mac_cgen.vhd file: In the Sources in Project window, double-click mac_cgen.vhd Read through the VHDL file and add code to the following sections: “Generating the Multiplier Select mac_cgen.vhd in the Sources in Project window In the Processes for Current Source window, expand Synthesis Double-click the Check Syntax option to perform syntax check Fix any reported errors Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-4 Generate an Accumulator Using the CORE Generator Step 3 Generate an accumulator by invoking the CORE Generator through the project. Make sure that the input data are signed data and the output width is 27 bits Create a new source: select Project New Source, or right-click, and choose New Source Figure 12-1. Adding a New Source to an ISE Project Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-5 Select IP(CoreGen & Architecture Wizard), type accum in the File Name field, and click Next Figure 12-2. Adding a CORE Generator to Your ISE Project. Select Core Type dialog box will be displayed. Expand Math Functions and then Accumulators Figure 12-4. Selecting Multiply Accumulators function. Select Accumulator, click Next button and then Finish Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-6 Fill in the following options on for the Accumulators GUI and click Generate to create the accumulator. Component Name: accum Operation: Add Port B Input Options: Port B Width 20; signed Output Options: Width 27, Registered Register Options: Clock Enable and Asynchronous Clear Create RPM: checked Select Display Core Footprint (bottom right of the GUI) Figure 12-6. Accumulator Options. Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-7 You will see a pop-up window indicating that the accum core was generated successfully. Click OK to invoke the Core Viewer Fill in the following information from the Core Viewer window The shape of the generated core should look like the following Figure 12-9. Core Viewer of the Multiplier Accumulator. ? 1. Fill in the following information from the Core Viewer window: Number of CLB wide: Number of CLB tall: Number of slices: Close the Core Viewer and the Core Generator by clicking the DISMISS button Note: For a detailed explanation of the output files, please see the documentation Help Online Documentation CORE Generator Guide, Chapter 3 Using the CORE Generator. The section listing inputs and outputs will thoroughly describe the input and output files Note: A accum.xco file will be added to your project in the mac_cgen hierarchy Adding the CORE Generator Macro into VHDL Code Step 4 Using the ISE Language Template, instantiate the multiply accumulator macro, accum, into the supplied top-level VHDL file mac_cgen.vhd Double-click the VHDL file mac_cgen.vhd in the Sources in Project window Open the Language Template by clicking on icon Template Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only or select Edit Language 12-8 Expand the Coregen VHDL folder, and select the accum template The template similar to shown below appears: Figure 12-10. Selecting the accum template. Using the template, add the component declaration between the architecture and begin statements as indicated in the mac_cgen.vhd file Using the template, add the instance of the accum in the mac_cgen.vhd file Change the instance name to U2 Connect the ports of accum to appropriate signals Check the syntax and correct any errors before proceeding to the next step Synthesize the Design Using XST Step 5 Synthesize the mac_cgen.vhd design using Xilinx Synthesis Technology (XST) tool with default options Remove the my_mac.xco file from the project. Select the mac_cgen.vhd file in the Sources in Project Window Run synthesis: Right-click Synthesis in the Processes for Current Source window and select the Run option If there are any errors, you can View Synthesis Report by expanding Synthesis, right-click and choose the View option Fix any errors and re-synthesize, otherwise continue on to the next step Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-9 Implement the MAC design Step 6 Implement your mac_cgen.vhd design using Xilinx implementation tools and view the Post-place & Route Static Timing Report. Make sure that the settings are as follows Device Family: Spartan3 Device: xc3s200 Speed Grade: 4 Package: FT256 Right-click Implement Design, and choose the Run option, or double left-click Implement Design ? 2. Which netlist files do the Xilinx implementation tools use for the accum black box? View the placed design in the FPGA Editor by selecting View/Edit Routed Design (FPGA Editor) under the Place and Route Figure 12-11. Opening the FPGA Editor. Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-10 Close the FPGA Editor when you are finished Use the place and route report and Text Based Post Place & Route Static Timing Report files: ? 3. Fill in the information requested below. Number of Slices: Number of Block Multipliers: Number of Block RAMs: Number of BUFGMUXs: Number of external IOBs: Maximum clock frequency: Create New Chipscope-Pro Project Step 7 Create a new Chipscope Pro project through the Project Navigator. Select Project New Source in Project Navigator to open the new source dialogue, click on Chipscope Definition and Connection, and enter the name mac_cs. Click <Next> to continue. Figure 12-12. Add New Chipscope Source Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-11 Select Chipscope Definition and Connection from the list and enter mac_cgen as the file name and click <next>. Select mac_cgen as the source. Click <next> and then <finish>. A Chipscope Pro source will be added to the Sources in Project window. Figure 12-13. Chipscope Definition and Connection Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-12 ILA Core Parameters and Connections Step 8 Insert an ICON and ILA core into the design netlist using Chipscope-Pro Core Inserter. Connect the output of the accumulator to the trigger and input data ports of the ILA core. Double-click the mac_cs.cdc file in the sources in project window to open the core inserter project. Figure 12-14. Chipscope Pro Core Inserter Projects saved in the Core Inserter hold all relevant information about source files, destination files, destination files, core parameters and core settings. Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-13 Click <next>. Leaving the Disable JTAG Clock BUFG Insertion option unchecked, click New ILA Unit. Notice in the left hand window how an instance of the ILA core, U0:ILA, is added to the system. Figure 12-15. Insert the New ILA Unit Note: Disabling the JTAG clock BUFG insertion causes the ISE tools to route the JTAG clock using normal routing resources instead of global clock routing resources. This option should only be selected of global routing resources are scarce. Click <next> to setup the trigger parameters Each ILA or ILA/ATC core can have up to 16 separate trigger ports that can be setup independently. The individual trigger ports are buses that are made up of individual signals or bits that can range from 1 to 256 bits. Each trigger port can be connected to 1 to 16 match units. A match unit is a comparator that is connected to a trigger port and is used to detect events on that trigger port. The results of one or more match units are combined together to form the overall trigger condition event that is used to control the capturing of data. The different comparisons or match functions that can be performed by the trigger port match units depend on the type of match unit. The ILA and ILA/ITC cores support six types of match units. Set the following ILA trigger parameters as follows and then click <next> Trigger Input and Match Unit Settings Number of trigger ports: 2 TRIG0: Trigger width: 1 # Match Units: 1 Counter Width: disabled Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-14 Match type: extended TRIG1: Trigger width: 1 # Match Units: 1 Counter Width: disabled Match type: extended Trigger Condition Settings Enable Trigger Sequencer: checked Max Number of Sequencer Levels: 2 Storage Qualification Condition Settings Enable Storage Qualification: unchecked Figure 12-16. Trigger Parameters The maximum number of data sample words that the ILA core can store in the sample buffer is called the data depth. The data depth determines the number of data width bits contributed by each block RAM unit used by the ILA unit. The maximum number of data sample words that can be captured depends on the number and size of block RAM, which varies according to device family and density. Set the following options and click <next> Data Depth: 512 Sample On: Rising clock edge Data Same as Trigger Port: unchecked Data Width: 47 Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-15 Figure 12-17. Capture Parameters The net connections tab allows you to choose the signals to connect to the ILA or ILA/ATC core. If trigger is separate from data, then clock, trigger, and data must be specified. Connections that have not been made will appear in red. Figure 12-18. Net Connections Click the Modify Connections tab Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-16 Figure 12-19. Net Connections This dialogue provides an easy interface to choose nets to connect to the ILA, ILA/ATC or ATC2 cores. The hierarchical structure of the design can be traversed using the Structure/Nets pane. All the design’s nets of the selected structure hierarchy appear in the table at the lower left pane. The Clock Signals and Trigger/Data Signals tabs illustrate the net connections between the design and the ILA core. With the Clock Signals tab under Net Selections selected, highlight the entry for clk_int and click the Make Connections button to connect the clock signal in the design to the clock port of the ILA core. Figure 12-20. Connect the clock Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-17 Click the Trigger/Data Signals tab and make the following connections in each of the subtabs: TP0:CH:0 nd_reg TP1:CH:0 clr_IBUF Figure 12-21. Trigger Signal Connection Click the Data Signals tab, make the following connections, and then click <OK>: CH:0 – CH:11 a_reg<0> - a_reg<11> CH:12 – CH:19 b_reg<0> - b_reg<7> CH:20 – CH:46 Q_0_OBUF - Q_26_OBUF Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-18 Figure 12-22. Data Signal Connections You will notice that the Clock and Trigger ports under Net Connections are highlighted in black, indicating valid connections. Click Return to Project Navigator and save the file. Implement the MAC Design Step 9 Implement your mac_cgen.vhd design using Xilinx implementation tools to generate a bitstream for downloading to the FPGA. Right-click Generate Programming File, and choose the Rerun All option. Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-19 Figure 12-23. Generate the Programming File Note: This runs the design through Place and Route. You will notice green check marks (or warning exclamations) next to the processes that have finished successfully. It will also run Post-Place & Route Static Timing and generate the static timing report. Use the place and route report and Text Based Post Place & Route Static Timing Report files: ? 4. Fill in the information requested below. Number of Slices: Number of Block Multipliers: Number of Block RAMs: Number of BUFGMUXs: Number of external IOBs: Maximum clock frequency: Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-20 Setup Chipscope-Pro Analyzer Options Step 10 The Chipscope-Pro Analyzer tool interfaces directly to the ICON, ILA, ILA/ATC, IBA/OPB, IBA/PLB, VIO, and ATC2 cores. You can configure your device, choose triggers, setup the console, and view the results of the capture on the fly. The data views and triggers can be manipulated in many ways, providing an easy and intuitive interface to determine the functionality of the design. Using Analyzer, you will configure the FPGA, specify the match units, and then setup the trigger conditions. Open Chipscope-Analyzer by going to Start Programs Chipscope Pro 6.3 Chipscope Pro Analyzer Connect the download cable to the PC parallel port and JTAG connection of the Spartan-3 board, and then power up the board. Click the Open Cable/Search JTAG Chain button Figure 12-24. Establish JTAG Connection The Spartan-3 board contains two devices in the JTAG chain: The Spartan-3 XC3S200 and a Platform Flash PROM XCF00S. Impact will detect these devices and list the device names along with Instruction Register (IR) Lengths and Device ID Codes. Figure 12-25. Impact Detects Devices in JTAG Chain Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-21 Click <OK>. Right Click on the Spartan-3 device, indicated as DEV: 0 MyDevice0 (XC3S200) and select configure. Figure 12-26. Download Program File to FPGA Click Select New File, browse to the project directory and select the bitstream file mac_cgen.bit. The Chipscope Pro Analyzer interface consists of four parts: Project Tree in the upper part of the split pane on the left side of the window Signal Browser in the lower part of the split pane on the left side of the window Message pane at the bottom of the window Main window area Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-22 Figure 12-27. Chipscope Pro Analyzer Each Chipscope Pro ILA, ILA/ATC, and IBA core has its own Trigger setup window, which provides a graphical interface for the user to setup triggers. The trigger mechanism inside each Chipscope Pro core can be modified at run-time without having to recompile the design. There are three components to the trigger mechanism: Match Functions: Defines the match or comparison value of each match unit Trigger Conditions: Defines the overall trigger condition based on a binary equation or sequence of one or more match functions Capture Settings: Defines how many samples to capture, how many capture windows, and the position of the trigger in those windows In this design, you will setup the triggers to capture 256 samples of both inputs to the multiplier and the output of the accumulator. Specify the Match Units as follows: Radix (both trigger ports): binary M0:TriggerPort0: Function ==; Value 1 M1:TriggerPort1: Function ==; Value 1 Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-23 Figure 12-28. Setup the Match Units You will now setup the trigger condition equation to capture samples after the following conditions occur in the proper order: 1. clear is asserted 2. enable signal nd_reg is asserted Click the field under Trigger Condition Equation, select the Sequencer tab and specify the following options to generate the equation M1 M0, and then click <OK>. Number of Levels: 2 Level 1: M1 Level 2: M0 Figure 12-29. Trigger Condition Equation Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-24 Perform on-chip Hardware Verification Step 9 In the next steps, you will combine the signals into busses in the waveform viewer to make it easier to view the results of on-chip debug. You will then perform on-chip verification and view the results of the verification. The wave form viewer will then be used to verify the operation of the MAC design. Perform the following actions to create buses that represents the A and B multiplier inputs and the Q_int accumulator output Select signals DataPort[0] through DataPort[11] so that they are highlighted Right-click the highlighted signals and select Add to Bus New Bus Right-click on the newly created bus, BUS_0, and rename it to A. Select signals DataPort[12] through DataPort[19] so that they are highlighted Right-click the highlighted signals and select Add to Bus New Bus Right-click on the newly created bus, BUS_1, and rename it to B. Select signals DataPort[20] through DataPort[46] so that they are highlighted Right-click the highlighted signals and select Add to Bus New Bus Right-click on the newly created bus, BUS_0, and rename it to Q_int. Figure 12-30. Create Buses Right-click on each of the busses and set the radix to signed decimal. Click the Apply Settings and Arm Trigger button Figure 12-31. Apply Settings and Arm Trigger Push the switch SW0 on the Spartan-3 board so that it is at the “on” position. This switch will enable the design. Press and release button BTN0 to set off the trigger and capture data samples in the ILA buffer. Pressing the button resets the following registers: Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-25 A and B registered inputs of the multiplier Registered enable nd_reg Accumulator Releasing the reset button connects nd_reg to the nd input pin, which is connected to switch SW0. Figure 12-32. Verification Results Once triggered the ILA core will capture the results in Block RAM and the ICON core will route the results back to the PC via the JTAG connection. The results will be illustrated in the waveform view. Notice the multiply-accumulate operations 1x1 = 1 (1x1) + (2x2) = 5 (1x1) + (2x2) + (3x3) = 14 etc. Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-26 Conclusion In this lab, you learned the basic design flow involved in incorporating the CORE Generator macros into a VHDL code. You generated a CORE Generator macro, and then simulated a design that contains CORE Generator macros, and then synthesized a design that contains CORE Generator macros using synthesis using XST. You ran a synthesized design that contains CORE Generator macros through the Xilinx implementation tools, and viewed how the core is implemented using the FPGA Editor. During the last steps, you inserted the ILA and ICON cores into the design and performed an on-chip verification using the Chipscope-Pro Analyzer. A Answers The Core Viewer Result: Figure 12-10. Core Viewer Results. 1. Fill in the following information from the Core Viewer window: Number of CLB wide: Number of CLB tall: Number of slices: 1 7 14 2. Which netlist files do the Xilinx implementation tools use for the accum black box? accum.edn (EDIF) netlist file which is generated by the CORE Generator Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-27 3. Fill in the information requested below. Number of Slices: Number of Block Multipliers: Number of Block RAMs: Number of BUFGMUXs: Number of external IOBs: 25 1 0 1 30 Maximum clock frequency: ~ 180 MHz 4. Fill in the information requested below. Number of Slices: Number of Block Multipliers: Number of Block RAMs: Number of BUFGMUXs: Number of external IOBs: 304 1 2 2 30 Maximum clock frequency: ~112 MHz Creating a 12 x 8 MAC Using the Xilinx CORE Generator university.xilinx.com For Academic Use Only 12-28