ECE 699: Lecture 8 Integrated Logic Analyzer & Profiling Required Reading LogiCORE Intergrated Logic Analyzer v5.0, PG172 LogiCORE IP Virtual Input/Output v3.0, PG159 Vivado Design Suite User Guide, Programming and Debugging, UG908 • • Chapter 5: Debugging Logic Design in Hardware Chapter 6: Viewing ILA Probe Data in the Waveform Viewer Xilinx Advanced Embedded System Design on Zynq • Lab 2: Debugging Using Hardware Analyzer (on Piazza) Recommended Videos Xilinx Inc., Programming and Debugging Design in Hardware https://www.youtube.com/watch?v=i8axs4hw2f4 M.S. Sadri, ZYNQ Training (presentations and videos) • Lesson 12 – AXI Memory Mapped Interfaces and Hardware Debugging, Part 1 Required Reading The ZYNQ Book • • Chapter 4.3.2 Execution Profiling Chapter 11.2 Profiling EDK Profiling User Guide, A Guide to Profiling in EDK, UG448 GNU gprof https://sourceware.org/binutils/docs/gprof/index.html Xilinx Advanced Embedded System Design on Zynq • • Profiling and Performance Improvement (on Piazza) Lab 6: Profiling and Performance Tuning (on Piazza) Traditional Logic Analyzer Source: Agilent Technologies Traditional Logic Analyzer Source: Agilent Technologies Traditional Logic Analyzer Source: technology guerilla Traditional Logic Analyzer (1) • • • • An electronic instrument that captures and displays multiple signals from a digital system Can be triggered on a complicated sequence of digital events, then capture a large amount of digital data from the system under test (SUT) Once the probes are connected, the user programs the analyzer with the names of each signal, and can group several signals together for easier manipulation A capture mode is chosen, either "timing" mode, where the input signals are sampled at regular intervals based on an internal or external clock source, or "state" mode, where one or more of the signals are defined as "clocks", and data are taken on the rising or falling edges of these clocks Source: Wikipedia Traditional Logic Analyzer (1) • • • A trigger condition is set. It can range from simple (such as triggering on a rising or falling edge of a single signal) to very complex The user sets the analyzer to "run" mode, either triggering once, or repeatedly triggering Once the data are captured, they can be displayed several ways, from the simple (showing waveforms or state listings) to the complex (showing decoded Ethernet protocol traffic) Source: Wikipedia Traditional Logic Analyzers Useless for Systems-on-Chip (SoC) Source: The Zynq Book Integrated Logic Analyzer Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide Integrated Logic Analyzer • • • • • • • IP core The core parameters specify the number of probes, the width for each probe input, and the trace sample depth After the design is loaded into the FPGA, one uses the Vivado® logic analyzer software to set up a trigger event for the ILA measurement After the trigger occurs, the sample buffer is filled and uploaded into the Vivado logic analyzer Signals, attached to the probe inputs, are sampled at design speeds and stored using on-chip block RAM (BRAM) Communication with the ILA core is conducted using an auto-instantiated debug core hub that connects to the JTAG interface of the FPGA The user can view this data using the waveform window Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide ILA Setup – Native Mode Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide ILA Setup – Native Mode Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide ILA Setup – AXI Mode Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide ILA Setup – AXI Mode Source: Integrated Logic Analyzer v5.0, LogiCORE IP Product Guide ILA Dashboard Source: Vivado Design Suite User Guide, Programming and Debugging ILA Trigger Modes • • • • • BASIC_ONLY: the AND or OR of logic values obtained by applying (possibly sophisticated) comparisons to selected probe values ADVANCED_ONLY: internal trigger signal specified by a user defined state machine. TRIG_IN_ONLY: the rising edge of the TRIG_IN pin of the ILA core BASIC_OR_TRIG_IN: combination of BASIC_ONLY and TRIGGER_IN_ONLY ADVANCED_OR_TRIG_IN: combination of ADVANCED_ONLY and TRIGGER_IN_ONLY Source: Vivado Design Suite User Guide, Programming and Debugging ILA Basic Trigger Setup Source: Vivado Design Suite User Guide, Programming and Debugging ILA Basic Trigger Setup Operators: ==, !=, <, <=, >, >= Radix: [B] Binary, [H] Hexadecimal, [O] Octal, [A] ASCII, [U] Unsigned Decimal, [S] Signed Decimal Value: [B] Binary: 0, 1, X (don’t care), R (rising edge), F (falling edge), B (either edge), N (no transition) [H] Hexadecimal: 0-9, A-F, X (don’t care for all 4 bits) [O] Octal: 0-7, X (don’t care for all 3 bits) [A] ASCII: any ASCII string [U] Unsigned Decimal: Any non-negative integer value [S] Signed Decimal: Any integer value Source: Vivado Design Suite User Guide, Programming and Debugging Setting Basic Trigger Condition Source: Vivado Design Suite User Guide, Programming and Debugging ILA Dashboard Source: Vivado Design Suite User Guide, Programming and Debugging Trigger Out Modes • DISABLED: disables the TRIG_OUT port • TRIGGER_ONLY: enables the result of the basic/advanced trigger condition to propagate to the TRIG_OUT port • TRIG_IN_ONLY: propagates the TRIG_IN port to the TRIG_OUT port • TRIGGER_OR_TRIG_IN: enables the result of a logical OR-ing of the basic/advanced trigger condition and TRIG_IN port to propagate to the TRIG_OUT port Source: Vivado Design Suite User Guide, Programming and Debugging Capture Mode Setting • The ILA core can capture data samples when the core status is Pre-Trigger, Waiting for Trigger, or Post-Trigger • The Capture mode control is used to select what condition is evaluated before each sample is captured: • • ALWAYS: store a data sample during a given clock cycle regardless of any capture conditions • BASIC: store a data sample during a given clock cycle only if the capture condition evaluates "true” The BASIC capture mode used to describe a capture condition that is a global Boolean equation of participating debug probe comparators Source: Vivado Design Suite User Guide, Programming and Debugging Class Exercise 1 Block Diagram Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Block Design Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Setting up the trigger Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Mark Debug Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Setting up the Corresponding ILA Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Monitoring AXI Transactions Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Block Design Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 1 Math IP Source: Xilinx Advanced Embedded System Design on Zynq course materials Virtual Input Output Source: LogiCORE IP Virtual Input/Output VIO Probes Source: Xilinx Advanced Embedded System Design on Zynq course materials Profiling Processor Activity Before and After Hardware Acceleration Source: The Zynq Book Types of Profiling Static • Without executing software program • Analysis of source code or object code Dynamic • Intrusive process whereby whereby the execution of a program on a processor is interrupted to gather information Source: The Zynq Book Dynamic Profiling Source: The Zynq Book Output of Dynamic Profiling Source: The Zynq Book Class Exercise 2 Block Diagram Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Block Design Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 SDK Settings Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Main.c: Choice between Two Implementations int main(void) { short signal, output; int i; for (i=0; i<SAMPLES; i++) { if(i==0) signal = 0x8000; else signal = 0; #ifdef SW_PROFILE fir_software(&output, signal); #else filter_hw_accel_input(&output, signal); #endif } return 0; } Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 fir_software() #include "fir.h” void fir_software (data_t *y, data_t x) { const coef_t c[N+1]={ #include "fir_coef.dat” }; static data_t shift_reg[N]; acc_t acc; int i; Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 fir_software() – cont. acc=(acc_t)shift_reg[N-1]*(acc_t)c[N]; loop: for (i=N-1;i>=0;i--) { acc+=(acc_t)shift_reg[i-1]*(acc_t)c[i]; shift_reg[i]=shift_reg[i-1]; } acc+=(acc_t)x*(acc_t)c[0]; shift_reg[0]=x; *y = acc>>15; } Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 filter_hw_accel_input() void filter_hw_accel_input(short * Sample_L_out, short Sample_L_in) { Xil_Out32(XPAR_FIR_LEFT_BASEADDR+XFIR_FIR_IO_ADDR_X_DATA, Sample_L_in); // send left channel sample Xil_Out32(XPAR_FIR_LEFT_BASEADDR+XFIR_FIR_IO_ADDR_AP_CTRL, 0x1); // pulse ap_start left channel Xil_Out32(XPAR_FIR_LEFT_BASEADDR+XFIR_FIR_IO_ADDR_AP_CTRL, 0x0); while(1){ if(Xil_In32(XPAR_FIR_LEFT_BASEADDR+XFIR_FIR_IO_ADDR_Y_CTRL)) break; else continue; } *Sample_L_out = Xil_In32(XPAR_FIR_LEFT_BASEADDR+ XFIR_FIR_IO_ADDR_Y_DATA); } Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Profiling Options Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Invoking gprof on gmon.out Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Software only: Samples per Function Class Exercise 2 Software/Hardware: Samples per Function Class Exercise 2 Software only: Function Call Graph Class Exercise 2 Software/Hardware: Function Call Graph Class Exercise 2 Software/Hardware: Samples per Function The influence of xil_printf Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Results for sampling frequency 1 MHz Source: Xilinx Advanced Embedded System Design on Zynq course materials Class Exercise 2 Results for sampling frequency 100 kHz Source: Xilinx Advanced Embedded System Design on Zynq course materials