Building Multi-Processor FPGA Systems Hands-on Tutorial to Using FPGAs and Linux Chris Martin <cmartin@altera.com> Member Technical Staff Embedded Applications Agenda Introduction Problem: How to Integrate Multi-Processor Subsystems Why… – Why would you do this? – Why use FPGAs? Lab 1: Getting Started - Booting Linux and Boot-strapping NIOS Lab 2: Inter-Processor Communication and Shared Peripherals Lab 3: Locking and Tetris Building Hardware: FPGA Hardware Tools & Build Flow Building/Debugging Software: Software Tools & Build Flow References Q&A – All through out. 2 The Problem – Integrating Multi-Processor Subsystems Given a system with multiple processor subsystems, these architecture decisions must be considered: Inter-processor communication Partitioning/sharing Peripherals (locking required) Bandwidth & Latency Requirements 3 Periph 1 Periph 1 Processor Subsystem 1 Periph 2 Periph 3 Processor Subsystem 2 Periph 2 Periph 3 Why Do We Need to Integrate Multi-Processor Subsystems? May have inherited processor subsystem from another development team or 3rd party – Risk Mitigation by reducing change Fulfill Latency and Bandwidth Requirements – Real-time Considerations – If main processor not Real-Time enabled, can add a real-time processor subsystem Design partition / Sandboxing – Break the system into smaller subsystems to service task – Smaller task can be designed easily Leverage Software Resources 4 – Sometimes problem is resolved in less time by Processor/Software rather than Hardware design – Sequencers, State-machines Why do we want to integrate with FPGA? (or rather, HOW can FPGAs help?) Bandwidth & Latency can be tailored – Addresses Real-time aspects of System Solution Simple Multiprocessor System A Peripheral ARM – FPGA logic has flexible interconnect – Trade Data width with clock frequency with latency Experimentation – Many processor subsystems can be implemented – Allows you to experiment changing microprocessor subsystem hardware designs – Altera FPGA under-the-hood – However: Generic Linux interfaces used and can be applied in any Linux system. 5 Shared Peripheral Mailbox NIOS N Peripheral And, why is Altera involved with Embedded Linux… Why is Altera Involved with Embedded Linux? 120,000 With Embedded Processor Without Processor CPU With CPU Without Embedded Design Starts 100,000 80,000 50% 60,000 40,000 20,000 0 Source: Gartner September 2010 More than 50% of FPGA designs include an embedded processor, and growing. Many embedded designs using Linux Open-source re-use. – 6 Altera Linux Development Team actively contributes to Linux Kernel SoCKit Board Architecture Overview Lab focus 7 UART DDR3 LEDs Buttons SoC/FPGA Hardware Architecture Overview DDR ARM-to-FPGA Bridges Data Width configurable A9 I$ A9 D$ I$ D$ L2 EMIF DMA ROM UART RAM SD/MMC FPGA 42K Logic Macros Using no more than 14% AXI Bridge AXI Bridge HPS2FPGA LWHPS2FPGA 32/64/128 32 AXI Bridge FPGA2HPS 32/64/128 SYS ID RAM FPGA Fabric “Soft Logic” 8 GPIO 32 NIOS Lab 1: Getting Started Booting Linux and Boot-strapping NIOS Topics Covered: – – – – – Configuring FPGA from SD/MMC and U-Boot Booting Linux on ARM Cortex-A9 Configuring Device Tree Resetting and Booting NIOS Processor Building and compiling simple Linux Application Key Example Code Provided: – C code for downloading NIOS code and resetting NIOS from ARM – Using U-boot to set ARM peripheral security bits Full step-by-step instructions are included in lab manual. 9 Lab 1: Hardware Design Overview NIOS Subsystem – 1 NIOS Gen 2 processor – 64k combined instruction/data RAM (On-Chip RAM) – GPIO peripheral Subsystem 1 SD/MMC EMIF Cortex-A9 UART ARM Subsystem – – – – 2 Cortex-A9 (only using 1) DDR3 External Memory SD/MMC Peripheral UART Peripheral RAM NIOS 0 GPIO Subsystem 2 Shared Peripherals 10 Dedicated Peripherals Lab1: Programmer View - Processor Address Maps NIOS 11 ARM Cortex-A9 Address Base Peripheral Address Base Peripheral 0xFFC0_2000 ARM UART 0xFFC0_2000 UART 0x0003_0000 GPIO (LEDs) 0xC003_0000 GPIO (LEDs) 0x0002_0000 System ID 0xC002_0000 System ID 0x0000_0000 On-chip RAM 0xC000_0000 On-chip RAM Lab 1: Peripheral Registers 12 Peripheral Address Offset Access Bit Definitions Sys ID 0x0 RO [31:0] – System ID. Lab Default = 0x00001ab1 GPIO 0x0 R/W [31:0] – Drive GPIO output. Lab Uses for LED control, push button status and NIOS processor resets (from ARM). [3:0] - LED 0-3 Control. ‘0’ = LED off . ‘1’ = LED on [4] – NIOS 0 Reset [5] – NIOS 1 Reset [1:0] – Push Button Status UART 0x14 RO Line Status Register [5] – TX FIFO Empty [0] – Data Ready (RX FIFO not-Empty) UART 0x30 R/W Shadow Receive Buffer Register [7:0] – RX character from serial input UART 0x34 R/W Shadow Transmit Register [7:0] – TX character to serial output Lab 1: Processor Resets Via Standard Linux GPIO int main(int argc, char** argv) Interface { int fd, gpio=168; char buf[MAX_BUF]; /* Export: echo ### > /sys/class/gpio/export */ fd = open("/sys/class/gpio/export", O_WRONLY); sprintf(buf, "%d", gpio); write(fd, buf, strlen(buf)); close(fd); NIOS resets connected to GPIO /* Set direction to Out: */ /* echo "out“ > /sys/class/gpio/gpio###/direction */ sprintf(buf, "/sys/class/gpio/gpio%d/direction", gpio); fd = open(buf, O_WRONLY); write(fd, "out", 3); /* write(fd, "in", 2); */ close(fd); GPIO driver uses /sys/class/gpio interface /* Set GPIO Output High or Low */ /* echo 1 > /sys/class/gpio/gpio###/value */ sprintf(buf, "/sys/class/gpio/gpio%d/value", gpio); fd = open(buf, O_WRONLY); write(fd, "1", 1); /* write(fd, "0", 1); */ close(fd); 13 /* Unexport: echo ### > /sys/class/gpio/unexport */ fd = open("/sys/class/gpio/unexport", O_WRONLY); sprintf(buf, "%d", gpio); write(fd, buf, strlen(buf)); close(fd); } Lab 1: Loading External Processor Code Via Standard Linux shared memory (mmap) NIOS RAM address accessed via mmap() Can be shared with other processes R/W during load Read-only protection after load /* Map Physical address of NIOS RAM to virtual address segment with Read/Write Access */ fd = open("/dev/mem", O_RDWR); load_address = mmap(NULL, 0x10000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0xc0000000); /* Set size of code to load */ load_size = sizeof(nios_code)/sizeof(nios_code[0]); /* Load NIOS Code */ for(i=0; i < load_size ;i++) { *(load_address+i) = nios_code[i]; } /* Set load address segment to Read-Only */ mprotect(load_address, 0x10000, PROT_READ); /* Un-map load address segment */ munmap(load_address, 0x10000); 14 Lab 2: Mailboxes NIOS/ARM Communication Topics Covered: – Altera Mailbox Hardware IP Key Example Code Provided: – C code for sending/receiving messages via hardware Mailbox IP NIOS & ARM C Code – Simple message protocol – Simple Command parser Full step-by-step instructions are included in lab manual. – User to add second NIOS processor mailbox control. 15 Lab 2: Hardware Design Overview NIOS 0 & 1 Subsystems – NIOS Gen 2 processor – 64k combined instruction/data RAM – GPIO (4 out, LED) – GPIO (2 in, Buttons) – Mailbox Subsystem 1 SD/MMC EMIF Cortex-A9 UART GPIO ARM Subsystem – – – – 2 Cortex-A9 (only using 1) DDR3 External Memory SD/MMC Peripheral UART Peripheral MBox RAM RAM NIOS 0 NIOS 1 GPIO GPIO Subsystem 2 Shared Peripherals 16 MBox Subsystem 3 Dedicated Peripherals Lab2: Programmer View - Processor Address Maps NIOS 0 & 1 17 ARM Cortex-A9 Address Base Peripheral Address Base Peripheral 0xFFC0_2000 ARM UART 0xFFC0_2000 UART 0x0007_8000 Mailbox (from ARM) 0x0007_8000 Mailbox (to NIOS 1) 0x0007_0000 Mailbox (to ARM) 0x0007_0000 Mailbox (from NIOS 1) 0x0005_0000 GPIO (In Buttons) 0x0006_8000 Mailbox (to NIOS 0) 0x0003_0000 GPIO (Out LEDs) 0x0006_0000 Mailbox (from NIOS 0) 0x0002_0000 System ID 0xC003_0000 GPIO (LEDs) 0x0000_0000 On-chip RAM 0xC002_0000 System ID 0xC001_0000 NIOS 1 RAM 0xC000_0000 NIOS 0 RAM Lab 2: Additional Peripheral (Mailbox) Registers Peripheral Address Offset Access Bit Definitions Mailbox 0x0 R/W [31:0] – RX/TX Data Mailbox 0x8 R/W [1] – RX Message Queue Has Data [0] – TX Message Queue Empty 18 LAB 2: Designing a Simple Message Protocol Design Decisions: Short Length: A single 32-bit word Human Readable Message transactions are closed- loop. Includes ACK/NACK Format: Message Length: Four Bytes First Byte is ASCII character Byte 0 Byte 1 Byte 2 Byte3 ‘L’ ‘0’ ‘0’ ‘\0’ ‘A’ ‘0’ ‘0’ ‘\0’ Message Types: “G00”: Give Access to UART (Push) “A1A”: ACK “N1A”:NACK denoting message type. Second Byte is ASCII char from Can be Extended: 0-9 denoting processor number. “L00”: LED Set/Ready Third Byte is ASCII char from 0-9 “B00”: Button Pressed denoting message data, except for “R00”: Request UART ACK/NACK. Access (Pull) Fourth Byte is always null “G00” character ‘\0’ to terminate string (human readable). Cortex-A9 NIOS 0 19 “A0A” “N0N” Lab 2: Inter-Processor Communication with Mailbox HW Via Standard Linux Shared Memory (mmap) 20 Wait for Mailbox Hardware message empty flag Send message (4 bytes) Disable ARM/Linux Access to UART Wait for RX message received flag Re-enable ARM/Linux UART Access /* Map Physical address of Mailbox to virtual address segment with Read/Write Access */ fd = open("/dev/mem", O_RDWR); mbox0_address = mmap(NULL, 0x10000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0xff260000); <snip> /* Waiting for Message Queue to empty */ while((*(volatile int*)(mbox0_address+0x2000+2) & 1) != 0 ) {} /* Send Granted/Go message to NIOS */ send_message = "G00"; *(mbox0_address+0x2000) = *(int *)send_message; /* Disable ARM/Linux Access to UART (be careful here)*/ config.c_cflag &= ~CREAD; if(tcsetattr(fd, TCSAFLUSH, &config) < 0) { } /* Wait for Received Message */ while((*(volatile int*)(mbox0_address+2) & 2) == 0 ) {} /* Re-enable UART Access */ config.c_cflag |= CREAD; tcsetattr(fd, TCSAFLUSH, &config); /* Read Received Message */ printf(" - Message Received. DATA = '%s'.\n", (char*)(mbox0_address)); Lab 3: Putting It All Together – Tetris! Combining Locking and Communication Topics Covered: – Linux Mutex Key Example Code Provided: – C code showcasing using Mutexes for locking shared peripheral access – C code for multiple processor subsystem bringup and shutdown Full step-by-step instructions are included in lab manual. – User to add code for second NIOS processor bringup, shutdown and locking/control. 21 Lab 3: Hardware Design Overview (Same As Lab 2) NIOS 0 & 1 Subsystems – NIOS Gen 2 processor – 64k combined instruction/data RAM – GPIO (4 out, LED) – GPIO (2 in, Buttons) – Mailbox Subsystem 1 SD/MMC EMIF Cortex-A9 UART GPIO ARM Subsystem – – – – 2 Cortex-A9 (only using 1) DDR3 External Memory SD/MMC Peripheral UART Peripheral MBox RAM RAM NIOS 0 NIOS 1 GPIO GPIO Subsystem 2 Shared Peripherals 22 MBox Subsystem 3 Dedicated Peripherals Lab 3: Programmer View - Processor Address Maps NIOS 0 & 1 23 ARM Cortex-A9 Address Base Peripheral Address Base Peripheral 0xFFC0_2000 ARM UART 0xFFC0_2000 UART 0x0007_8000 Mailbox (from ARM) 0x0007_8000 Mailbox (to NIOS 1) 0x0007_0000 Mailbox (to ARM) 0x0007_0000 Mailbox (from NIOS 1) 0x0005_0000 GPIO (In Buttons) 0x0006_8000 Mailbox (to NIOS 0) 0x0003_0000 GPIO (Out LEDs) 0x0006_0000 Mailbox (from NIOS 0) 0x0002_0000 System ID 0xC003_0000 GPIO (LEDs) 0x0000_0000 On-chip RAM 0xC002_0000 System ID 0xC001_0000 NIOS 1 RAM 0xC000_0000 NIOS 0 RAM Available Linux Locking/Synchronization Mechanisms Need to share peripherals – Choose a Locking Mechanism Available in Linux – – – – – Mutex <- Chosen for this Lab Completions Spinlocks Semaphores Read-copy-update (decent for multiple readers, single writer) – Seqlocks (decent for multiple readers, single writer) Available for Linux – MCAPI - openmcapi.org 24 Tetris Message Protocol – Extended from Lab 2 NIOS Control Flow: “B00” NIOS 0 – Wait for button press – Send Button press message “A0A” – Wait for ACK (Free to write to LED GPIO) – Write to LED GPIO “L00” – Send LED ready msg – Wait for ACK “A0A” ARM Control Flow: – Wait for button press message “B10” NIOS 1 – Lock LED GPIO Peripheral – Send ACK (Free to write to LED GPIO) “A1A” – Wait for LED ready msg – Send ACK “L10” – Read LED value – Release Lock/Mutex 25 “A1A” Cortex-A9 Lab 3: Locking Hardware Peripheral Access Via Linux Mutex pthread_mutex_t lock; <snip – Initialize/create/start> /* Initialize Mutex */ err = pthread_mutex_init(&lock, NULL); In this example, LED GPIO is accessed by multiple processors Wrap LED critical section (LED status reads) with: pthread_mutex_lock() pthread_mutex_unlock() Also need Mutex init/destroy: pthread_mutex_init() pthread_mutex_destroy() /* Create 2 Threads */ i=0; while(i < 1) { err = pthread_create(&(tid[i]), NULL, &nios_buttons_get, &(nios_num[i])); i++; } <snip – Critical Section> pthread_mutex_lock(&lock); /* Critical Section */ pthread_mutex_unlock(&lock); <snip Stop/Destroy> /* Wait for threads to complete */ pthread_join(tid[0], NULL); pthread_join(tid[1], NULL); /* Destroy/remove lock */ pthread_mutex_destroy(&lock); 26 References 27 Altera References System Design Tutorials: – http://www.alterawiki.com/wiki/Designing_with_AXI_for_Altera_SoC_ARM_Devices_Workshop_Lab__Creating_Your_AXI3_Component – Designing_with_AXI_for_Altera_SoC_ARM_Devices_Workshop_Lab – Simple_HPS_to_FPGA_Comunication_for_Altera_SoC_ARM_Devices_Workshop – http://www.alterawiki.com/wiki/Simple_HPS_to_FPGA_Comunication_for_Altera_SoC_ARM_Devices_Workshop_-_LAB2 Multiprocessor NIOS-only Tutorial: – http://www.altera.com/literature/tt/tt_nios2_multiprocessor_tutorial.pdf Quartus Handbook: – https://www.altera.com/en_US/pdfs/literature/hb/qts/quartusii_handbook.pdf Qsys: – System Design with Qsys (PDF) section in the Handbook – Qsys Tutorial: Step-by-step procedures and design example files to create and verify a system in Qsys – Qsys 2-day instructor-led class: System Integration with Qsys – Qsys webcasts and demonstration videos SoC Embedded Design Suite User Guide: – https://www.altera.com/en_US/pdfs/literature/ug/ug_soc_eds.pdf Related Articles Performance Analysis of Inter-Processor Communication Methods – http://www.design-reuse.com/articles/24254/inter-processor-communicationmulti-core-processors-reconfigurable-device.html Communicating Efficiently between QorlQ Cores in Medical Applications – https://cache.freescale.com/files/32bit/doc/brochure/PWRARBYNDBITSCE.p df Linux Inter-Process Communication: – http://www.tldp.org/LDP/tlk/ipc/ipc.html Linux locking mechanisms (from ARM): – http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0425/ch04s 07s03.html OpenMCAPI: – https://bitbucket.org/hollisb/openmcapi/wiki/Home Mutex Examples: – http://www.thegeekstuff.com/2012/05/c-mutex-examples/ 29 Thank You Full Tutorial Resources Online Project Wiki Page: http://rocketboards.org/foswiki/Projects/BuildingMultiProce ssorSystems Includes: Source code Hardware source Hardware Quartus Projects Software Eclipse Projects BACKUP SLIDES Post-Lab 1 Additional Topics Hardware Design Flow and FPGA Boot with U-boot and SD/MMC 32 Building Hardware: Qsys (Hardware System Design Tool) User Interface Interfaces Exported In/out of system Connections between cores 33 Hardware and Software Work Flow Overview Preloader & U-Boot Quartus & Qsys Eclipse DS-5 & Debug Tools Device Tree RBF Inputs: – Hardware Design (Qsys or RTL or Both) Outputs (to load on boot media): – Preloader and U-boot Images – FPGA Programmation File: Raw Binary Format (RBF) – Device Tree Blob 34 SDCARD Layout Partition 1: FAT – – – – – Uboot scripts FPGA HW Designs (RBF) Device Tree Blobs zImage Lab material Partition 2: EXT3 – Rootfs Partition 3: Raw – Uboot/preloader Partition 4: EXT3 – Kernel src 35 Updating SD Cards File Update Procedure zImage Mount DOS SD card partition 1 and replace file with new one: $ sudo mkdir sdcard $ sudo mount /dev/sdx1 sdcard/ $ sudo cp <file_name> sdcard/ $ sudo umount sdcard soc_system.rbf soc_system.dtb u-boot.scr preloader-mkpimage.bin $ sudo dd if=preloader-mkpimage.bin of=/dev/sdx3 bs=64k seek=0 u-boot-socfpga_cyclone5.img $ sudo dd if=u-boot-socfpga_cyclone5.img of=/dev/sdx3 bs=64k seek=4 root filesystem $ sudo dd if=altera-gsrd-imagesocfpga_cyclone5.ext3 of=/dev/sdx2 More info found on Rocketboards.org – http://www.rocketboards.org/foswiki/Documentation/GSRD141SdCard Automated Python Script to build SD Cards: – make_sdimage.py 36 Post-Lab 2 Additional Topic Using Eclipse to Debug: NIOS Software Build Tools 37 Altera NIOS Software Design and Debug Tools Nios II SBT for Eclipse key features: – New project wizards and software templates – Compiler for C and C++ (GNU) – Source navigator, editor, and debugger – Eclipse project-based tools – Download code to hardware 38 Key Multi-Processor System Design Points Startup/Shutdown – Processor – Peripheral – Covered in Lab 1. Communication between processors – – – – What is the physical link? What is the protocol & messaging method? Message Bandwidth & Latency Covered in Lab 2 Partitioning peripherals – – – – 39 Declare dedicated peripherals – only connected/controlled by one processor Declare shared peripherals – Connected/controlled by multiple processors Decide Upon Locking Mechanism Covered in Lab 3 Post Lab 3 Additional Topic Altera SoC Embedded Design Suite Altera Software Development Tools Eclipse – For ARM Cortex-A9 (ARM Development Studio 5 – Altera Edition) – For NIOS Pre-loader/U-Boot Generator Device Tree Generator Bare-metal Libraries Compilers – GCC (for ARM and NIOS) – ARMCC (for ARM with license) Linux Specific – Kernel Sources – Yocto & Angstrom recipes: http://rocketboards.org/foswiki/Documentation/AngstromOnSoCFPGA_1 – Buildroot: http://rocketboards.org/foswiki/Documentation/BuildrootForSoCFPGA 41 System Development Flow FPGA Design Flow Hardware Development Software Design Flow Software Development • Quartus II design software • Qsys system integration tool • Standard RTL flow • Altera and partner IP Design Design • ModelSim, VCS, NCSim, etc. • AMBA-AXI and Avalon bus functional models (BFMs) Simulate Simulate Debug Debug Release Release • SignalTap™ II logic analyzer • System Console • Quartus II Programmer • In-system Update 42 • Eclipse • GNU toolchain • OS/BSP: Linux, VxWorks • Hardware Libraries • Design Examples • GDB, Lauterbach, Eclipse • Flash Programmer Inside the Golden System Reference Design Complete system example design with Linux software support Target Boards: – Altera SoC Development Kits – Arrow SoC Development Kits – Macnica SoC Development Kits Hardware Design: – Simple custom logic design in FPGA – All source code and Quartus II / Qsys design files for reference Software Design: – Includes Linux Kernel and Application Source code – Includes all compiled binaries 43 ---Topics – Back Up--Introductions: Altera and SoC FPGAs Development Tools – How to Build Hardware: FPGA Hardware Tools & Build Flow – How to Build Software: Software Tools & Build Flow – How to Debug all-of-the-above: Debug Tools Key Multi-processor System Design Points Hardware design – Shared peripherals – Available Hardware IP Software design – Message Protocols – Linux tools/mechanism available today 44 Quartus – Hardware Development Tool Quartus II User Interface Quartus II main window provides a high level of visibility to each stage of the design flow Project navigator provides direct visual access to most of the key project information Tasks window allows you to use the tools and features of the Quartus II software and monitor their progress from a flow-based layout Tool View window shows various tools and design files Messages window outputs messages from each process of the run 46 Project Navigator Tool View window Tasks window Messages window Typical Hardware Design Flow Project definition Project creation Design entry/RTL coding and early pin planning Design creation Functional verification Synthesis (mapping) • Verify design behavior Functional verification Logic Memory I/O Design compilation • Translate design into device-specific primitives • Optimization to meet required area and performance constraints Placement and routing (fitting) • Place design in specific device resources with reference to area and performance constraints • Connect resources with routing lines Timing analysis Functional verification • Verify design will work in target technology • Behavioral or structural description of design • Early pin planning allows board development in parallel Functional verification • Verify performance specifications were met • Static timing analysis PC board simulation and test In-system debug 47 • Simulate board design • Program and test device on board • On-chip tools for debugging Quartus II Feature Overview Fully integrated development tool – Multiple design entry methods Includes intellectual property- (IP-) based system design – Up-front I/O assignment and validation Enables printed circuit board (PCB) layout early in the design process Project definition Project creation Design creation – Incremental compilation Reduces design compilation and improves timing closure – Logic synthesis Includes comprehensive integrated synthesis solution Advanced integration with third-party EDA synthesis software – Timing-driven placement and routing – Physical synthesis Improves performance without user intervention – Verification solution TimeQuest timing analyzer PowerPlay power analysis and optimization Functional simulation – On-chip debug and verification suite 48 Functional verification Memory Logic I/O Design compilation Functional verification In-system debug Quartus II Feature Overview (1/2) Feature Project creation Design entry Quartus II Software New project wizard HDL editor Schematic editor State machine editor MegaWizard™ Plug-In Manager – Customization and generation of IP Qsys system integration tool Design constraint assignments Assignment editor Pin planner Synopsys Design Constraint (SDC) editor Synthesis Quartus II Integrated Synthesis (QIS) Third-party EDA synthesis Design assistant Fitting and placing design into FPGA to meet user requirements Fitter (including physical synthesis) Design analysis and debug Netlist viewers Advisors Power analysis PowerPlay power analyzer 49 Quartus II Feature Overview (2/2) Feature Quartus II Software Static timing analysis on post-fitted design TimeQuest timing analyzer Viewing and editing design placement Chip Planner Functional verification ModelSim®-Altera edition Third-party EDA simulation tools Generation of device programming file Assembler On-chip debug and verification Technique to optimize design and improve productivity Quartus II incremental compilation Physical synthesis optimization Design Space Explorer (DSE) 50 SignalTapTM II (embedded logic analyzer) In-system memory content editor Logic analyzer interface editor In-system sources and probes editor SignalProbe pins Transceiver Toolkit External memory interface toolkit Quartus II Subscription Edition vs. Web Edition Subscription Edition Device supported Software features: Incremental compilation and team-based design SSN Analyzer Transceiver Toolkit MAX series devices: All (Excluding MAX7000 / 3000) Cyclone III/IV/V FPGAs: All Arria II/V FPGAs: All Stratix III, IV, V FPGAs: All Cyclone V SoCs: All Web Edition MAX series devices: All (Excluding MAX7000 / 3000) Cyclone V FPGAs: All (Excluding 5CEA9, 5CGXC9, and 5CGTD9) Cyclone III/IV FPGAs: All Arria II GX FPGA: EP2AGX45 Cyclone V SoCs: All Yes No SignalTap II, SignalProbe Yes If TalkBack feature is enabled Multi-processor support Yes If TalkBack feature is enabled Yes No license required for OpenCore Plus hardware evaluation License fee required for production use Windows 32/64-bit Linux 32/64-bit Windows 32/64-bit Linux 32/64-bit Perpetual (continues to work after expiration) No license required except for IP core $ Free IP Base Suite MegaCore® functions Platform support License and maintenance terms 51Price How to Get Started Using Quartus II Software Download Quartus II software today and start designing with Altera programmable logic devices Quartus II Handbook - http://www.altera.com/literature/lit-qts.jsp – Guides you through the programmable logic design cycle from design to verification – Also covers third-party EDA vendor tool interfaces Online demonstrations - http://www.altera.com/quartusdemos – Easiest way to learn about the latest Quartus II software features and design flows Training classes - https://mysupport.altera.com/etraining – Offers online training classes and live presentation coupled with hands-on exercises to learn about Quartus II features and design flows Agenda 52 Qsys – System Integration Platform Qsys System Integration Platform High-Performance Interconnect Design Reuse Hierarchy Based on Network-on-a-Chip (NoC) Architecture Package as IP Design System Add to Library Automated Testbench Generation Industry-Standard Interfaces Avalon® Interfaces ® AMBA AXI, APB, AHB Qsys is Altera’s design environment for 54 Real-Time System Debug ® Deployment of IP, with hierarchal support Development platform for Altera custom solutions Design platform for customers to quickly create system designs Qsys User Interface Interfaces Exported for Hierarchy Toolbar Improved Validation Display 55 Qsys Benefits Raises the level of design abstraction – System-level design and system visualization Simplifies complex hierarchal system development – Automated interconnect generation Provides a standard platform – IP integration, custom IP authoring, IP verification Enables design re-use Reduces time to market – System-level design reduces development time – Facilitates verification Qsys improves productivity 56 Network-on-Chip Architecture Transaction Layer Converts transactions to command packets and responses packets to responses Avalon-MM AXI-MM 57 Transport Layer Transaction Layer Transfers packets to destination Converts command packets to transactions and responses to response packets Avalon-MM AXI-MM Avalon-ST Master Interface Master Network Interface Avalon ST Network (Command) Slave Network Interface Slave Interface Master Interface Master Network Interface Avalon ST Network (Response) Slave Network Interface Slave Interface Benefits of Network-On-Chip Approach See white paper: Applying the Benefits of NoC Architecture to FPGA System Design Independent implementation of transaction/transport layers – Different transport layer network topologies can be implemented without transaction layer modification e.g. High performance components on a wide high-frequency crossbar network Supports standard interface interoperability – Mix and match interface types on transaction layer without transport layer modification Scalability – Segment network into sub-networks using Bridges Clock crossing logic 58 Industry-Standard Interfaces Developer Standard Interface Protocol Avalon® Interfaces ® AMBA® AXI, AMBA APB, and AMBA AHB Qsys supports mixing of different interfaces 59 Target Qsys Applications Qsys can be used in almost every FPGA design Designs fall into two categories – Control plane Memory mapped Reading and writing to control and status registers – Data plane Streaming Data switching (muxing, demuxing), aggregation, bridges “Packets………I care about Latency!” Qsys packet format is wide – Packet format contains a complete transaction in a single clock cycle – Supports: Writes with 0 cycles of latency Reads with a round-trip latency of 1 cycle – You can control latency via Qsys configuration Separate command and response network – Increases concurrency Command traffic and Response traffic don’t compete for resources 61 Qsys: Wide Range of Compliant IP Wide range of plug-and-play intellectual property (IP): – Interface protocol IP E.g. PCIe, Ethernet 10/100/1000 Mbps (TripleSpeed Ethernet), Interlaken, JTAG, UART, SPI – External memory interface IP E.g. DDR/DDR2/DDR3 – Video and imaging processing (VIP) IP E.g. VIP Suite including scaler, switch, deinterlacer, and alpha blending mixer – Embedded processor IP E.g. Hardened ARM processor system, Nios II processor – Verification IP E.g. Avalon-MM/-ST, AXI4, APB >100 Qsys compliant IP available 62 Qsys as a Platform for System Integration Library of Available IP Connect IP and Systems Interface protocols Memory DSP Embedded Bridges PLL Custom systems Accelerate Development IP 1 Custom 1 IP 2 IP 3 Custom 2 HDL Simplify Integration Automate Error-Prone Integration Tasks 63 Additional Resources Watch online demos (3-5 min) www.altera.com/qsys Complete the Qsys tutorial (2-3 hrs) www.altera.com/qsys Watch free webcasts (10-15 mins) www.altera.com/qsys Sign up for Qsys training www.altera.com/training 64 In-system Verification Debug Challenges Accessing and viewing internal signals Not enough pins to use as test points Capabilities in creating trigger conditions that correctly capture data Verification of standard or proprietary protocol interfaces Overall design process bottleneck Debug Can Be Costly 66 On-chip Debug Access and view internal signals Store captured data in FPGA embedded memory Use JTAG interface as debug ports Incrementally add internal signals to view Reduce Debug Cycles by Using On-chip Debug Tools 67 On-chip Debug Technology Debug tools communicate with the FPGA via standard JTAG interface Multiple debug functions can share the JTAG interface simultaneously – Altera’s system-level debugging (SLD) hub technology makes this possible – All Altera tools and some third-party tools support the SLD hub JTAG interface FPGA Node 1 Download Cable 68 JTAG Tap Controller SLD Hub User's Design (Core Logic) Node 2 Node N Node N-1 On-chip Debug Tools in Quartus II Software SignalTap II logic analyzer – Captures and displays hardware events, fast turnaround times – Incrementally creates trigger conditions and adds signals to view – Uses captured data stored in on-chip RAM and JTAG interface for communication In-system memory content editor – Displays content of on-chip memory – Enables modification of memory content in a running system External logic analyzer interface – Uses external logic analyzer to view internal signals – Dynamically switches internal signals to output In-system sources and probes – Stimulate and monitor internal signals without using on-chip RAM Exception: SignalProbe incremental routing feature does not use JTAG interface (i.e. SLD hub technology) – Quickly routes an internal node to a pin for observation 69 SignalTap II Logic Analyzer Provides the most advanced triggering capabilities available in an FPGA-embedded logic analyzer Proven to be invaluable in the lab – Captures bugs that would take weeks of simulation to uncover Has broad customer adoption Features and benefits – An embedded logic analyzer Uses available internal memory – Probes state of internal signals without using external equipment or extra I/O pins – Incremental compilation support Fast turnaround time when adding signals to view – Advanced triggering for capturing difficult events/transactions – Power-up trigger support Debug the initialization code – Megafunction support Optionally, instantiate in HDL 70 In-system Memory Content Editor Enables FPGA memory content and design constants to be updated insystem, via JTAG interface, without recompiling a design or reconfiguring the rest of the FPGA – Fault injection into system – Update memory while system is running – Change value of coefficients in DSP applications – Easily perform “what if?” type experiments in-system in just seconds Supports MIF and HEX formats for data interchange Megafunctions supported – LPM_CONSTANT, LPM_ROM, LPM_RAM_DQ, ALTSYNCRAM (ROM and single-port RAM mode) Enable memory content editor 71 In-system Memory Content Editor Under Tools menu In-system Memory Content Editor 72 Altera SoC Embedded Design Suite Included in SoC Embedded Design Suite (EDS) Development Studio 5 Altera Edition – Awesome debugger, especially when combined with USB Blaster II Altera SoC FPGA System Trace Macrocells – Application development environment – Streamline system analyzer Hardware Libraries GNU-based bare-metal (EABI) compiler tools U-Boot Root file system to jump start software development Pre-built Linux kernel – http://www.rocketboards.org for source trees and community access 74 System Development Flow FPGA Design Flow Hardware Development Software Design Flow Software Development • Quartus II design software • Qsys system integration tool • Standard RTL flow • Altera and partner IP Design Design • ModelSim, VCS, NCSim, etc. • AMBA-AXI and Avalon bus functional models (BFMs) Simulate Simulate Debug Debug Release Release • SignalTap™ II logic analyzer • System Console • Quartus II Programmer • In-system Update 75 • ARM Development Studio 5 • GNU toolchain • OS/BSP: Linux, VxWorks • Hardware Libraries • Design Examples • GNU, Lauterbach, DS5 • Flash Programmer Altera SoC Embedded Design Suite FPGA Design Flow Software Design Flow Hardware Development • Quartus II design software • Qsys system integration tool • Standard RTL flow • Altera and partner IP Design • ModelSim, VCS, NCSim, etc. • AMBA-AXI and Avalon bus functional models (BFMs) Simulate • SignalTap™ II logic analyzer • System Console • Quartus II Programmer • In-system Update 76 Software Development HW/SW Handoff Design Simulate • ARM Development Studio 5 • GNU toolchain • OS/BSP: Linux, VxWorks • Hardware Libraries • Design Examples • VirtualSoftware Target Development Debug Release FPGA-Adaptive Debugging Debug Release • GNU, Lauterbach, DS5 • Flash Programmer Altera SoC Embedded Design Suite Comprehensive Suite SW Dev Tools Hardware-toSoftware Handoff Hardware / software handoff tools Linux application development – Yocto Linux build environment – Pre-built binaries for Linux / U-Boot – Work in conjunction with the Community Portal Firmware Development Linux Application Development Bare-metal application development – SoC Hardware Libraries – Bare-metal compiler tools FPGA-adaptive debugging – ARM DS-5 Altera Edition Toolkit Design examples 77 FPGAAdaptive Debugging Free Web Edition Subscription Edition Free 30-day Eval Hardware-to-Software Handoff Hardware Qsys system info, SDRAM calibration files, ID / timestamp, HPS IOCSR data system.iswinfo Software 78 system.sopcinfo Preloader Generator Device Tree Generator .c & .h source files Linux Device Tree Hardware / Software Handoff Tools 79 Allow hardware and software teams to work independently and follow their familiar design flows Take Altera Quartus® II / Qsys output files and generate handoff files for the software design flow Device Tree standard specifies hardware connectivity so that Linux kernel can boot up correctly Linux Application Development Yocto build support for Linux – Yocto standard enables open, versatile, and cost-effective embedded software development – Allows a smooth transition to commercial Linux distributions Pre-built Linux kernel, U-Boot, and root file system to jump start software development – Link to community portal for source trees and community access 80 Bare-metal Application Development Hardware Libraries – Software interface to all system registers – Functions to configure some basic system operations (e.g. clock speed settings, cache settings, FPGA configuration, etc.) – Support board bring-up and diagnostics development – Can be used by bare-metal application, device drivers, or RTOS GNU-based bare-metal (EABI) compiler tools 81 Application Operating System BSP Hardware BMAL HAL PAL Libraries SoC FPGA Baremetal App Golden System Reference Design Complete system design with Linux software support – Simple custom logic design in FPGA – All source code and Quartus II / Qsys design files for reference – Include all compiled binariesexample can run on an Altera SoC Development Kit to jumpstart development 82 DS-5 Altera Edition- One Tool, Three Usages 1 • JTAG-Based Debugging 2 • Board Bring-up • OS porting, Drivers Dev, • System Integration • Kernel Debug • System Debug • Application Debugging 83 • Linux User Space Code • RTOS App Code 3 • FPGA-Adaptive Debugging One Device, Two Debugging Tools? ARM® DS-5™ Toolkit DSTREAM™ 84 Altera Quartus™ II Software JTAG Dedicated JTAG connection Visualize & control CPU subsystem JTAG Dedicated JTAG connection Visualize & control FPGA One Device, Two Debugging Tools? ARM® DS-5™ Toolkit DSTREAM™ 85 Altera Quartus™ II Software JTAG Dedicated JTAG connection Visualize & control CPU subsystem JTAG Dedicated JTAG connection Visualize & control FPGA Industry First: FPGA-Adaptive Debugging Altera USB-Blaster™II Connection ARM® Development Studio 5 (DS-5™) Altera® Edition Toolkit Removes debugging barrier between CPUs and FPGA Exclusive OEM agreement between Altera and ARM Result of innovation in silicon, software, and business model 86 FPGA-Adaptive Debugging Features Single USB-Blaster II cable for simultaneous SW and HW debug Automatic discovery of FPGA peripherals and creation of register views Hardware cross-triggering between the CPU and FPGA domains Correlation of CPU software instructions and FPGA hardware events Simultaneous debug and trace for Cortex-A9 cores and CoreSight™-compliant cores in FPGA Statistical analysis of software load and bus traffic spanning the CPUs and FPGA 87 DS-5 Altera Edition Productivity-Boosting Features Industry’s most advanced multicore debugger for ARM JTAG based system-level debugging, gdbserver-based application debugging in one package Yocto plugin to enable Linux based application development Integrated OS-aware analysis and debug capability 88 Visualization of SoC Peripherals Register views assist the debug of FPGA peripherals – File generated by FPGA tool flow – Automatically imported in DS-5 Debugger Debug views for debug of software drivers – Self-documenting – Grouped by peripheral, register and bit-field CMSIS Peripheral register descriptions 89 FPGA-Adaptive, Unified Debugging FPGA connected to debug and trace buses for nonintrusive capture and visualization of signal events Simultaneous debug and trace connection to CPU cores and compatible IP Correlate FPGA signal events with software events and CPU instruction trace using triggers and timestamps 90 Cross-Domain Debug 1 Trigger from software world to FPGA world SOFTWARE TRIGGER HARDWARE TRIGGER! 91 Cross-Domain Debug 2 Trigger from FPGA world to software world HARDWARE TRIGGER EXECUTION STOP OR HW TRACE TRIGGER 92 EXECUTION STOP OR SW TRACE TRIGGER Correlate HW and SW Events Debug event trigger point set from either: ARM® DS-5™ Toolkit SignalTap™ II Logic Analyzer or DS-5 debugger Captured trace can then be analyzed using timestamp-correlated events 93 Timestamp Correlated SignalTap II Logic Analyzer System-Level Performance Analysis Performance bottlenecks in SoCs often come from the CPU interaction with the rest of the SoC Streamline visualizes software activity with performance counters from the SoC and FPGA to enable full system-level analysis Streamline only requires a TCP/IP connection to the SoC 94 ARM® DS-5™ Streamline Linux OS Counters Processor Counters, Aggregated, or Per Core Power Consumption FPGA Block Counters Process/Thread Heat Map Application Events Altera SoC EDS- Key Benefits One-stop shop from Altera All the tools and examples for rapid starts Familiar tools interface, easy to use Share tools and knowledge to increase team productivity Best multicore debugger tools for ARM architecture Unprecedented visibility and control across processor cores and across CPU, FPGA domains Faster time to market, lower development costs! 95 Target Users and Usages Web Edition Board Bring-up Yes Device Drivers Dev Yes OS Porting Yes Baremetal Programming Yes RTOS Based App Dev Yes Linux Based App Dev 96 Subscription Edition Yes Yes Multicore App Debugging Yes System Debugging Yes SoC EDS Editions Summary Component Hardware/Software Handoff Tools ARM DS-5 Altera Edition Web Edition Subscription Edition 30-Day Evaluation Preloader Image Generator x x x Flash Image Creator x x x Device Tree Generator (Linux) x x x Eclipse IDE x x x Key Feature ARM Compiler* Debugging over Ethernet (Linux) x x x Debugging over USB-Blaster II JTAG x x Automatic FPGA Register Views x x Hardware Cross-triggering x x CPU/FPGA Event Correlation x x x x x CodeBench Lite EABI (Bare-metal) x x x Hardware Libraries Bare-metal programming Support x x x SoC Programming Examples Golden System Reference Design x x x Compiler Tool Chains Linaro Tool Chain (Linux) x *ARM Compiler is available in DS-5 Professional Edition, available directly from ARM 97 Coordinated Multi-Channel Delivery Altera.com Quartus II Programmer SignalTap II 98 Altera.com RocketBoards.org Pre-built Binaries • Kernel • U-Boot • Yocto • Minimal RFS • Tool chains • Handoff tools • HW Libraries • Examples • Documentation Frequent Updates • Kernel source • U-Boot source • Yocto source • RFS source • Toolchain source • Public git • Wiki • Mailman Partners BSPs Middleware 3rd Party Tools Altera NIOS Software Design Tools Nios II SBT for Eclipse key features: – New project wizards and software templates – Compiler for C and C++ (GNU) – Source navigator, editor, and debugger – Eclipse project-based tools 99