NetFPGA Spring Camp Day 1 Presented by: Andrew W. Moore with Marek Michalski, Neelakandan Manihatty-Bojan, Gianni Antichi Georgina Kalogeridou, Jong Hun Han, Noa Zilberman aided by Yury Audzevich, Dimosthenis Pediaditakis Poznan University of Technology May 20 – 24, 2013 http://NetFPGA.org Spring Camp 2013 1 Tutorial Outline • Background – Introduction – The NetFPGA Platform • The Base Reference Router – Motivation: Basic IP review – Example: Reference Router running on the NetFPGA • Infrastructure – Tree – Build System – Scripts • The Life of a Packet Through the NetFPGA – Hardware Datapath – Interface to software: Exceptions and Host I/O • Implementation – Module Template – Write Crypto NIC using a static key • Simulation and Debug – Write and Run Simulations for Crypto NIC • Concluding Remarks Spring Camp 2013 2 Section I: Motivation Spring Camp 2013 3 NetFPGA = Networked FPGA A line-rate, flexible, open networking platform for teaching and research Spring Camp 2013 4 NetFPGA consists of… Four elements: • NetFPGA board NetFPGA 1G Board • Tools + reference designs • Contributed projects • Community NetFPGA 10G Board Spring Camp 2013 5 NetFPGA-10G • A major upgrade over the 1Gb/s predecessor • State-of-the-art technology Spring Camp 2013 6 10 Gigabit Ethernet • 4 SFP+ Cages • AEL2005 PHY • 10G Support – Direct Attach Copper – 10GBASE-R Optical Fiber • 1G Support – 1000BASE-T Copper – 1000BASE-X Optical Fiber Spring Camp 2013 7 Others • QDRII-SRAM – 27MB – Storing routing tables, counters and statistics • RLDRAM-II – 288MB – Packet Buffering • PCI Express x8 – PC Interface • Expansion Slot Spring Camp 2013 8 Xilinx Virtex 5 TX240T • Optimized for ultra high-bandwidth applications • 48 GTX Transceivers • 4 hard Tri-mode Ethernet MACs • 1 hard PCI Express Endpoint Spring Camp 2013 9 NetFPGA Board Comparison NetFPGA 1G NetFPGA 10G 4 x 1Gbps Ethernet Ports 4 x 10Gbps Ethernet Ports 4.5 MB ZBT SRAM 64 MB DDR2 SDRAM 27 MB QDRII-SRAM 288 MB RLDRAM-II PCI PCI Express x8 Virtex II-Pro 50 Virtex 5 TX240T Spring Camp 2013 10 Beyond Hardware GitHub, User Community MicroBlaze SW PC SW Xilinx EDK Reference Designs Spring Camp 2013 AXI4 IPs • NetFPGA-10G Board • Xilinx EDK based IDE • Reference designs with ARM AXI4 • Software (embedded and PC) • Public Repository • Public Wiki 11 NetFPGA board Networking Software running on a standard PC CPU Memory PCIe A hardware accelerator built with a Field Programmable Gate Array driving 10 Gigabit network links Spring Camp 2013 PC with NetFPGA 10GE FPGA 10GE 10GE Memory 10GE 12 Tools + Reference Designs Tools: • Compile designs • Verify designs • Interact with hardware Reference designs: • Router (HW) • Switch (HW) • Network Interface Card (HW) • Router Kit (SW) • SCONE (SW) Spring Camp 2013 13 Contributed Projects Project OpenFlow switch Contributor Stanford University IPv4 Router RLDRAM I/F ARP Reply Pisa & Cambridge Uni. Cambridge Univerity Stanford University Encap & DCAP Learning CAM Switch Stanford University Pisa & Cambridge Uni. More projects: • https://github.com/NetFPGA/NetFPGA-public/wiki/Projects Spring Camp 2013 14 Community Wiki • Documentation – User’s Guide “so you just got your first NetFPGA” – Developer’s Guide “so you want to build a …” • Encourage users to contribute Forums • Support by users for users • Active community - 10s-100s of posts/week • Incubating the 10G mailing-list into a forum Spring Camp 2013 15 International Community Over 1,000 users, using 2,000 cards at 150 universities in 40 countries Spring Camp 2013 16 NetFPGA’s Defining Characteristics • Line-Rate – Processes back-to-back packets • Without dropping packets • At full rate of 10 Gigabit Ethernet Links – Operating on packet headers • For switching, routing, and firewall rules – And packet payloads • For content processing and intrusion prevention • Open-source Hardware – Similar to open-source software • Full source code available • BSD-Style License for 1G and LGPL 2.1 for 10G – But harder, because • Hardware modules must meeting timing • Verilog & VHDL Components have more complex interfaces • Hardware designers need high confidence in specification of modules Spring Camp 2013 17 Test-Driven Design • Regression tests – Have repeatable results – Define the supported features – Provide clear expectation on functionality • Example: Internet Router – – – – – Drops packets with bad IP checksum Performs Longest Prefix Matching on destination address Forwards IPv4 packets of length 64-1500 bytes Generates ICMP message for packets with TTL <= 1 Defines how packets with IP options or non IPv4 … and dozens more … Every feature is defined by a regression test Spring Camp 2013 18 Who, How, Why Who uses the NetFPGA? – – – Teachers Students Researchers How do they use the NetFPGA? – – To run the Router Kit To build modular reference designs • • • IPv4 router 4-port NIC Ethernet switch, … Why do they use the NetFPGA? – – To measure performance of Internet systems To prototype new networking systems Spring Camp 2013 19 Spring Camp Objectives • Overall picture of NetFPGA • How reference designs work • How you can work on a project – NetFPGA Design Flow – Directory Structure, library modules and projects – How to utilize contributed projects • Interface/Registers – How to verify a design (Simulation and Regression Tests) – Things to do when you get stuck AND… You can build your own projects! Spring Camp 2013 20 Section II: Network review Spring Camp 2013 21 Internet Protocol (IP) Data to be transmitted: IP packets: Ethernet Frames: Data IP Hdr Data IP Hdr Data Eth IP Hdr Hdr Data Eth IP Hdr Hdr Data Spring Camp 2013 22 … … IP Hdr Data Eth IP Hdr Hdr Data Internet Protocol (IP) Data … 1 4 Ver 16 HLen T.Service 20 bytes Fragment ID TTL Protocol 32 Total Packet Length Flags Fragment Offset Header Checksum Source Address Destination Address Options (if any) Spring Camp 2013 23 IP Hdr Data Basic operation of an IP router R3 A R1 R4 B C D E R2 Destination Next Hop D R3 E R3 F R5 Spring Camp 2013 24 R5 F Basic operation of an IP router R3 A R1 R4 B C D E R2 Spring Camp 2013 R5 25 F Forwarding tables 32 bits wide → ~ 4 billion unique address IP address Naïve approach: One entry per address Entry Destination Port 1 2 ⋮ 232 0.0.0.0 0.0.0.1 ⋮ 255.255.255.255 1 2 ⋮ 12 ~ 4 billion entries Improved approach: Group entries to reduce table size Entry Destination Port 1 2 ⋮ 50 0.0.0.0 – 127.255.255.255 128.0.0.1 – 128.255.255.255 ⋮ 248.0.0.0 – 255.255.255.255 1 2 ⋮ 12 Spring Camp 2013 26 IP addresses as a line Your computer My computer Stanford Asia Berkeley North America 232-1 0 All IP addresses Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) 1 2 3 4 5 Spring Camp 2013 27 Longest Prefix Match (LPM) Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) 1 2 3 4 5 Matching entries: • Stanford • North America • Everywhere To: Stanford Spring Camp 2013 Most specific Data 28 Universities Continents Planet Longest Prefix Match (LPM) Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) 1 2 3 4 5 Matching entries: • North America • Everywhere To: Canada Most specific Data Spring Camp 2013 29 Universities Continents Planet Implementing Longest Prefix Match Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) 1 2 3 4 5 Spring Camp 2013 30 Searching Most specific FOUND Least specific Basic components of an IP router Software Management & CLI Routing Protocols Routing Table Hardware Forwarding Switching Queuing Table Spring Camp 2013 31 Control Plane Data Plane per-packet processing IP router components in NetFPGA Linux SCONE Management & CLI Routing Protocols Routing Table OR Routing Protocols Routing Table Software Management & CLI Router Kit Forwarding Table Spring Camp 2013 Input Arbiter Switching Output Queues Queuing 32 Hardware Output Port Lookup Section III: Example I Spring Camp 2013 33 Operational IPv4 router Java GUI Software SCONE Management & CLI Routing Protocols Routing Table Hardware Reference router Forwarding Switching Queuing Table Spring Camp 2013 34 Control Plane Data Plane per-packet processing Streaming video Spring Camp 2013 35 Streaming video NetFPGA running reference router PC & NetFPGA (NetFPGA in PC) Spring Camp 2013 36 Streaming video Video streaming over shortest path Video client Video server Spring Camp 2013 37 Streaming video Video client Video server Spring Camp 2013 38 Observing the routing tables Columns: • Subnet address • Subnet mask • Next hop IP • Output ports Spring Camp 2013 39 Example 1 Spring Camp 2013 40 Review NetFPGA as IPv4 router: •Reference hardware + SCONE software •Routing protocol discovers topology Demo: •Ring topology •Traffic flows over shortest path •Broken link: automatically route around failure Spring Camp 2013 41 Section III: Example II Spring Camp 2013 42 Buffers in Routers • Internal Contention • Pipelining • Congestion Rx Tx Rx Tx Rx Tx Spring Camp 2013 43 Buffer Sizing Story 2T ´C Spring Camp 2013 2T ´ C n 44 O(logW ) Using NetFPGA to explore buffer size • Need to reduce buffer size and measure occupancy • Alas, not possible in commercial routers • So, we will use the NetFPGA instead Objective: – Use the NetFPGA to understand how large a buffer we need for a single TCP flow. Spring Camp 2013 45 Reference Router Pipeline • Five stages MAC RxQ – Input interfaces – Input arbitration – Routing decision and packet modification – Output queuing – Output interfaces • Packet-based module interface • Pluggable design MAC TxQ Spring Camp 2013 CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ Input Arbiter Output Port Lookup Output Queues CPU TxQ MAC TxQ 46 CPU TxQ MAC TxQ CPU TxQ MAC TxQ CPU TxQ Extending the Reference Pipeline MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ CPU TxQ MAC TxQ CPU TxQ Input Arbiter Output Port Lookup Event Capture Output Queues Rate Limiter MAC TxQ CPU TxQ Spring Camp 2013 MAC TxQ CPU TxQ MAC TxQ 47 Enhanced Router Pipeline Two modules added MAC RxQ CPU RxQ MAC RxQ 1. Event Capture to capture output queue events (writes, reads, drops) CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ CPU TxQ MAC TxQ CPU TxQ Input Arbiter Output Port Lookup Event Capture Output Queues 2. Rate Limiter to create a bottleneck Rate Limiter MAC TxQ CPU TxQ CPU TxQ MAC TxQ Spring Camp 2013 48 MAC TxQ Topology for Exercise 2 Recall: NetFPGA host PC is life-support: power & control NetFPGA running extended reference rout nf2c2 So: nf2c1 eth2 eth1 PC & NetFPGA The host PC may physically route its traffic through the Iperf Server local NetFPGA Spring Camp 2013 (NetFPGA in PC) Iperf Client 49 Example 2 Spring Camp 2013 50 Review NetFPGA as flexible platform: •Reference hardware + SCONE software •new modules: event capture and rate-limiting Example 2: Client Router Server topology – Observed router with new modules – Started tcp transfer, look at queue occupancy – Observed queue change in response to TCP ARQ Spring Camp 2013 51 Section IV: Infrastructure Spring Camp 2013 52 Infrastructure • Tree structure • NetFPGA package contents – Reusable Verilog modules – Verification infrastructure – Build infrastructure – Utilities – Software libraries Spring Camp 2013 53 NetFPGA package contents • Projects: – HW: router, switch, NIC, buffer sizing router – SW: router kit, SCONE • Reusable Verilog modules • Verification infrastructure: – simulate full board with PCI-express + physical interfaces – run tests against hardware – test data generation libraries (eg. packets) • Build infrastructure • Utilities: – register I/O, packaging, … • Software libraries Spring Camp 2013 54 Tree Structure (1) NetFPGA-10G projects (including reference designs) contrib-projects (contributed user projects) lib (custom and reference IP Cores and software libraries) tools (scripts for running simulations etc.) docs (design documentations and user-guides) https://github.com/NetFPGA/NetFPGA-10G-live Spring Camp 2013 55 Tree Structure (2) lib hw (hardware logic as IP cores) std (reference cores) contrib (contributed cores) xilinx (Xilinx provided cores) sw (core specific software drivers/libraries) std (reference libraries) contrib (contributed libraries) xilinx (Xilinx provided libraries) Spring Camp 2013 56 Tree Structure (3) projects/reference_nic bitfiles (FPGA executables) hw (EDK based project) data (contains user constraint files) nf10 (contains files used to run various tools) pcores (contains local hardware peripherals) sw embedded (contains code for microblaze) host (contains code for host communication etc.) Spring Camp 2013 57 Reusable logic (IP cores) Category IP Core(s) I/O interfaces Ethernet MAC (1G/10G) PCI Express GPIO MDIO UART Output queues BRAM based Output port lookup OpenFlow Switch NIC 1G to 10G ports Learning switch (CAM based) NIC Memory interfaces SRAM Flash Miscellaneous PBS* to AXIS bridge RBS** to AXI-LITE bridge FIFOs * PBS – packet bus stream in NF1G ** RBS – register bus stream in NF1G Spring Camp 2013 58 What is a PCore? A BRIEF DETOUR Spring Camp 2013 59 What’s pcore? IP Core? P Core? Pcore? pcore? - all the same. •“Processor Core” in EDK •HDL (Verilog/VHDL) + metadata files •≈ module •Examples: – 10G MAC – SRAM Controller – NIC Output port lookup Spring Camp 2013 60 HDL (Verilog) • All NetFPGA-10G pcores are AXI-compliant • AXI = Advanced eXtensible Interface – Used in ARM-based embedded systems – Standard interface to bridge pcores together – AXI4/AXI4-Lite: Control and status interface – AXI4-Stream: Data path interface • Xilinx IPs and tool chains are migrating to AXI Spring Camp 2013 61 Pcore metadata files • Help EDK put pcores together • PAO: Peripheral Analyze Order – Files that needs synthesizing • MPD: Microprocessor Peripheral Definition – Defines interface of the pcore (e.g. # of AXI’s) • Consult Xilinx UG642 for details Spring Camp 2013 62 Verification Infrastructure (1) • Simulation and Debugging – built on industry standard Xilinx “iSim” simulator and “Scapy” – Python scripts for stimuli construction and verification Spring Camp 2013 63 Verification Infrastructure (2) • iSim – a High Level Description (HDL) simulator – performs functional and timing simulations for embedded, VHDL, Verilog and mixed designs • Scapy – a powerful interactive packet manipulation library for creating “test data” – provides primitives for many standard packet formats – allows addition of custom formats Spring Camp 2013 64 Build Infrastructure (1) • Based on the design flow of: – “Xilinx Embedded Development Kit” – “ARM’s AXI4 interface specification” Spring Camp 2013 65 Build Infrastructure (2) • Build/Synthesis – collection of shared hardware peripherals cores (pcores) stitched together with “AXI4”, “Lite” and “Stream” buses – bitfile generation and verification using Xilinx synthesis and implementation tools • XST • Translate, MAP, PAR • BitGen Spring Camp 2013 66 Build Infrastructure (3) • Register system – collates and generates addresses for all the registers and memories in a project – uses Xilinx EDK – libgen utility to generate “include” files for software applications Spring Camp 2013 67 Inter-Module Communication – Using AXI-4 Stream (Packets are moved as Stream) TDATA TUSER Module i TSTRB TLAST TVALID TREADY Spring Camp 2013 68 Module i+1 AXI4-Stream AXI4-Stream TDATA TSTRB TVALID TREADY TLAST TUSER Spring Camp 2013 Description Data Stream Strobe signal (i.e. byte enable) Valid Indication Flow control indication End of packet/burst indication Out of band metadata 69 Packet Format TLAST TUSER TSTRB TDATA 0 X 0xFF…F Eth Hdr 0 X 0xFF…F IP Hdr 0 X 0xFF…F … 1 X 0x0…1F Last word Spring Camp 2013 70 TUSER Position [15:0] [23:16] [31:24] Content length of the packet in bytes source port: one-hot encoded destination port: one-hot encoded [127:32] 6 user defined slots, 16bit each Spring Camp 2013 71 TVALID/TREADY Signal timing – No waiting! – Assert TREADY/TVALID whenever appropriate – TVALID should not depend on TREADY TVALID TREADY Spring Camp 2013 72 Byte ordering • In compliance to AXI, NF 10G has a specific byte ordering – 1st byte of the packet @ TDATA[7:0] – 2nd byte of the packet @ TDATA[15:8] Spring Camp 2013 73 73 Section V: Life of a Packet Spring Camp 2013 74 Reference Router Pipeline • Five stages MAC RxQ – Input – Input arbitration – Routing decision and packet modification – Output queuing – Output • Packet-based module interface • Pluggable design MAC RxQ Spring Camp 2013 MAC RxQ MAC RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ Input Arbiter Output Port Lookup Output Queues MAC RxQ 75 MAC RxQ Full System Components Software nf0 nf1 nf2 nf3 ioctl PCIe Bus CPU RxQ NetFPGA AXI Lite CPU TxQ user data path MAC MAC MAC MAC MAC MAC TxQMAC RxQ MAC TxQ RxQ TxQ RxQ TxQ RxQ Ethernet Spring Camp 2013 76 Life of a Packet through the Hardware 192.168.1.x port0 Spring Camp 2013 port2 77 192.168.2.y MAC Rx Queue MAC Rx Queue Spring Camp 2013 78 Rx Queue Length, Src port, Dst port, User defined 0 0 Eth Hdr: Dst MAC = port 0, Ethertype = IP IP Hdr: IP Dst: 192.168.2.3, TTL: 64, Csum:0x3ab4 Rx Queue Payload TUSER Spring Camp 2013 TDATA 79 Input Arbiter Rx Q4 Pkt … Rx Q1 Input Arbiter Pkt Rx Q0 Pkt Spring Camp 2013 80 Output Port Lookup Output Port Lookup Spring Camp 2013 81 Output Port Lookup 5- Update output port in TUSER 1- Check input port matches Dst MAC 2- Check TTL, checksum 3- Lookup next hop IP & output port (LPM) 4- Lookup next hop MAC address (ARP) Length, Src port, Dst port, User defined 0 0 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP IP Hdr: IP Dst: 192.168.2.3, TTL: 63, Csum:0x3ac2 Output Port Lookup Payload TUSER Spring Camp 2013 6- Modify MAC Dst and Src addresses 7-Decrement TTL and update checksum TDATA 82 Output Queues OQ0 Output Queues OQ2 OQ4 Spring Camp 2013 83 MAC Tx Queue MAC Tx Queue Spring Camp 2013 84 MAC Tx Queue Length, Src port, Dst port, User defined 0 0 Spring Camp 2013 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP IP Hdr: IP Dst: 192.168.2.3, TTL: 63, Csum:0x3ac2 MAC Tx Queue Payload 85 Exception Packets • Example: TTL = 0 or TTL = 1 • Packet has to be sent to the CPU • Host generates an ICMP packet response • Difference starts at the Output Port Lookup Spring Camp 2013 86 Exception Packet Path Software nf0 nf1 nf2 nf3 ioctl PCI Bus CPU RxQ NetFPGA AXI Lite CPU TxQ user data path MAC MAC MAC MAC MAC MAC TxQMAC RxQ MAC TxQ RxQ TxQ RxQ TxQ RxQ Ethernet Spring Camp 2013 87 Output Port Lookup 1- Check input port matches Dst MAC 2- Check TTL, checksum – EXCEPTION! Length, Src port, Dst port, User defined 0 3- Add output port module Spring Camp 2013 0 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP IP Hdr: IP Dst: 192.168.2.3, TTL: 1, Csum:0x3ab4 Output Port Lookup Payload 88 Output Queues OQ0 OQ1 Output Queues OQ2 OQ4 Spring Camp 2013 89 CPU Tx Queue CPU Tx Queue Spring Camp 2013 90 CPU Tx Queue Length, Src port, Dst port, User defined 0 0 Spring Camp 2013 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP IP Hdr: IP Dst: 192.168.2.3, TTL: 1, Csum:0x3ab4 CPU Tx Queue Payload 91 ICMP Packet • Packet arrives at the CPU Rx Queue from the PCI-express Bus • Same path as a packet from the MAC until it reaches the Output Port Lookup (OPL) • The OPL module sees the packet is from the CPU Rx Queue 1 and sets the output port directly to 0 • The packet continues on the same path as the non-exception packet to the Output Queues and then MAC Tx queue 0 Spring Camp 2013 92 ICMP Packet Path Software nf0 nf1 nf2 nf3 ioctl PCI Bus CPU RxQ NetFPGA AXI Lite CPU TxQ user data path MAC MAC MAC MAC MAC MAC TxQMAC RxQ MAC TxQ RxQ TxQ RxQ TxQ RxQ Ethernet Spring Camp 2013 93 NetFPGA-Host Interaction • Linux driver interfaces with hardware – Packet interface via standard Linux network stack – Register reads/writes via ioctl system call with wrapper functions: • rdaxi(int address, unsigned *rd_data); • wraxi(int address, unsigned *wr_data); eg: rdaxi(0x7d4000000, &val); Spring Camp 2013 94 NetFPGA-Host Interaction NetFPGA to host packet transfer 1. Packet arrives – forwarding table sends to CPU queue Spring Camp 2013 PCI Bus 2. Interrupt notifies driver of packet arrival 3. Driver sets up and initiates DMA transfer 95 NetFPGA-Host Interaction NetFPGA to host packet transfer (cont.) PCI Bus 4. NetFPGA transfers packet via DMA 5. Interrupt signals completion of DMA 6. Driver passes packet to network stack Spring Camp 2013 96 NetFPGA-Host Interaction Host to NetFPGA packet transfers PCI Bus 2. Driver sets up and initiates DMA transfer 3. Interrupt signals completion of DMA 1. Software sends packet via network sockets Packet delivered to driver Spring Camp 2013 97 NetFPGA-Host Interaction Register access PCI Bus 2. Driver performs PCIe memory read/write 1. Software makes ioctl call on network socket ioctl passed to driver Spring Camp 2013 98 Section V: Contributed Projects Spring Camp 2013 99 Running the Router Kit User-space development, 4x10GE line-rate forwarding CPU OSPF BGP My Protocol user Memory kernel Routing Table PCI-Express “Mirror” Fwding Table 10GbE FPGA Spring Camp 2013 10GbE 10GbE 10GbE IPv4 Router 10GbE Memory Packet Buffer 10GbE 10GbE 10GbE 100 Enhancing Modular Reference Designs CPU Memory PCI-Express 10GbE FPGA 10GbE 10GbE Memory 10GbE Verilog, System PW-OSPF Verilog, Java GUIEDA Tools Front PanelVHDL, (Xilinx, Bluespec…. (Extensible) Mentor, etc.) NetFPGA Driver 1.Design 2.Simulate L3 L2 In3.Synthesize Q Parse ParseMgmt 4.Download My IP Out Q Block Lookup Mgmt 10GbE 10GbE 10GbE 10GbE Verilog modules interconnected by FIFO interface Spring Camp 2013 101 Creating new systems CPU Memory PCI-Express 10GbE FPGA 10GbE Verilog, System Verilog, EDA Tools VHDL, (Xilinx, 1.Design Bluespec…. Mentor, etc.) 2.Simulate NetFPGA Driver 3.Synthesize 4.Download 10GbE My Design 10GbE 10GbE Memory 10GbE Spring Camp 2013 10GbE (10GE MAC is soft/replaceable) 10GbE 102 NetThreads, NetThreads-RE, NetTM Martin Labrecque Gregory Steffan U. of Toronto Geoff Salmon Monia Ghobadi Yashar Ganjali ECE Dept. CS Dept. •Efficient multithreaded design –Parallel threads deliver performance •System Features –System is easy to program in C –Time to results is very short Spring Camp 2013 103 Soft Processors in FPGAs FPGA Ethernet MAC DDR controller Processor Soft processors: processors in the FPGA fabric User uploads program to soft processor Easier to program software than hardware in the FPGA Could be customized at the instruction level Spring Camp 2013 104 Example 1G Contributed Projects Project OpenFlow switch Contributor Stanford University Packet generator NetFlow Probe NetThreads Stanford University Brno University University of Toronto zFilter (Sp)router Traffic Monitor DFA Ericsson University of Catania UMass Lowell Spring Camp 2013 105 10G Contributions Alongside reference projects we have… Project Contributor Bluespec switch Traffic Monitor NF1G legacy on NF10G Simple/better DMA core MIT/SRI International University of Pisa Uni Pisa & Uni Cambridge Stanford RAMcloud project Your ideas …. …. You? …. …. Spring Camp 2013 106 Future for NetFPGA 10G Non-reference Projects we know about • High precision capture card – 4 10GbE ports, GPS input, 8ns resolution. • Traffic generator – 4 port 10GbE real-traffic generator • OpenFlow native port (Verilog) to OVS Spring Camp 2013 107 How might we use NetFPGA? • • • Build an accurate, fast, line-rate NetDummy/nistnet element • Hardware channel bonding reference implementation Well I’m not sure about you but here is a list I created: A flexible home-grown monitoring card • TCP sanitizer • • Prototype a full line-rate next-generation Ethernet-type • Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) • Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) • Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) • Hardware supporting Virtual Routers • Check that some brave new idea actually works • e.g. Rate Control Protocol (RCP), Multipath TCP, • toolkit for hardware hashing • MOOSE implementation • IP address anonymization • SSL decoding “bump in the wire” • Xen specialist nic application classifiers, and other neat network apps….) – (and • computational co-processor • Distributed computational co-processor • IPv6 anything • IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) • Netflow v9 reference • PSAMP reference • IPFIX reference • Different driver/buffer interfaces (e.g. PFRING) • or “escalators” (from gridprobe) for faster network monitors • Firewall reference • GPS packet-timestamp things • High-Speed Host Bus Adapter reference implementations • – Infiniband • – iSCSI • – Myranet • – Fiber Channel • Smart Disk adapter (presuming a direct-disk interface) • Software Defined Radio (SDR) directly on the FPGA (probably UWB only) • Routing accelerator • – Hardware route-reflector Evaluate new packet classifiers – • • • • • • • • • • • • • • • • • • • • • • • • • • (and application classifiers, and other neat network apps….) Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powerm DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for rout h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints fo Prototype platform for NON-Ethernet or near-Ethernet MACs • Build an accurate, fast, line-rate NetDummy/nistnet element • A flexible home-grown monitoring card • Evaluate new packet classifiers • Prototype a full line-rate next-generation Ethernet-type • Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) • Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) • Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) • Hardware supporting Virtual Routers – • Internet exchange route accelerator Check that some brave new idea actually works 108 Spring Camp 2013 – Optical LAN (no buffers) How might YOU use NetFPGA? • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • – (and application classifiers, and other neat network apps….) • Prototype a full line-rate next-generation Ethernet-type • Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) • Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) • Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) • Hardware supporting Virtual Routers • Check that some brave new idea actually works • e.g. Rate Control Protocol (RCP), Multipath TCP, • toolkit for hardware hashing • MOOSE implementation • IP address anonymization • SSL decoding “bump in the wire” • Xen specialist nic • computational co-processor • Distributed computational co-processor • IPv6 anything • IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) • Netflow v9 reference • PSAMP reference • IPFIX reference • Different driver/buffer interfaces (e.g. PFRING) • or “escalators” (from gridprobe) for faster network monitors • Firewall reference • GPS packet-timestamp things • High-Speed Host Bus Adapter reference implementations • – Infiniband • – iSCSI • – Myranet • – Fiber Channel • Smart Disk adapter (presuming a direct-disk interface) • Software Defined Radio (SDR) directly on the FPGA (probably UWB only) • Routing accelerator • – Hardware route-reflector Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers – Internet exchange route accelerator Spring Camp 2013 109 Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powerm DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for rout h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints fo Prototype platform for NON-Ethernet or near-Ethernet MACs – Optical LAN (no buffers) Section VI: Example Project Spring Camp 2013 110 Project: Cryptographic NIC Implement a network interface card (NIC) that encrypts upon transmission and decrypts upon reception Spring Camp 2013 111 Cryptography XOR function A B A^B 0 0 0 0 1 1 1 0 1 1 1 0 XORing a value with itself always yields 0 XOR written as: ^ ⊻ ⨁ XOR is commutative: (A ^ B) ^ C = A ^ (B ^ C) Spring Camp 2013 112 Cryptography (cont.) Simple cryptography: – Generate a secret key – Encrypt the message by XORing the message and key – Decrypt the ciphertext by XORing with the key Explanation: (M ^ K) ^ K = M ^ (K ^ K) = M^0 = M Spring Camp 2013 113 Commutativity A^A=0 Cryptography (cont.) Example: Message: 00111011 Key: 10110001 Message ^ Key: 10001010 Key: 10110001 Message ^ Key ^ Key: 00111011 Spring Camp 2013 114 Cryptography (cont.) Idea: Implement simple cryptography using XOR – 32-bit key – Encrypt every word in payload with key Header Payload ⨁ Key Key Key Key Key Note: XORing with a one-time pad of the same length of the message is secure/uncrackable. See: http://en.wikipedia.org/wiki/One-time_pad Spring Camp 2013 115 Section VII: Implementation Spring Camp 2013 116 Embedded Development Kit (1) • A development environment for building “embedded systems” • NetFPGA-10G is largely built upon the “EDK” Spring Camp 2013 117 Embedded Development Kit (2) • Xilinx EDK suite is composed of: – Platform Studio (XPS), a top level integrated design tool for “hardware” synthesis , implementation and bitstream generation – Software Development Kit (SDK), a development environment for “software application” running on embedded processors like Microblaze Spring Camp 2013 118 Xilinx Platform Studio (1) • An XPS project consists of following: – system.xmp • top level XPS project file – system.mhs • microprocessor hardware specification file • defines the embedded processor system, and includes bus architecture, peripherals, processor, and connectivity etc. – system.ucf • user constraint file • defines constraints such as timing, area, IO placement etc. Spring Camp 2013 119 Xilinx Platform Studio (2) • To invoke XPS design tool, run: # xps <project_root>/hw/system.xmp • This will open the project in the XPS graphical user interface • • • • open a new terminal cd ~/NetFPGA-10G-live/projects/crypto_nic source /opt/Xilinx/13.4/ISE_DS/settings64.sh xps hw/system.xmp Spring Camp 2013 120 XPS Design Tool (1) Bus Assembly System Assembly IP Catalog Spring Camp 2013 121 XPS Design Tool (2) • IP Catalog: contains categorized list of all available peripheral cores • Bus Assembly: shows connectivity of various modules over AXI bus • System Assembly: provides a complete view of instantiated cores – Cores can be added and deleted, in this view Spring Camp 2013 122 XPS Design Tool (2) Port view • Port view: provides listing of internal and external ports for each peripheral core Spring Camp 2013 123 XPS Design Tool (3) Address view • Address view: defines base and high address value for peripheral cores connected to AXI4 or AXI-LITE bus • These values can be changed manually, here Spring Camp 2013 124 Getting started with a new project (1) • Projects: – Each design represented by a project – Location: NetFPGA-10G-live/projects/<proj_name> • ~/NetFPGA-10G-live/projects/crypto_nic – Consists of: • • • • • Verilog source Simulation tests Hardware tests Libraries Optional software Spring Camp 2013 125 Getting started with a new project (2) – Normally: • copy an existing project as the starting point – Today: • pre-created project – Missing from pre-created project: • • • • Verilog files (with crypto implementation) Simulation tests Hardware tests Custom software Spring Camp 2013 126 Getting started with a new project (3) Typically implement functionality in one or more modules under the top wrapper MAC RxQ MAC RxQ MAC RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ Input Arbiter Output Port Lookup Crypto Crypto module to encrypt and decrypt packets Output Queues MAC RxQ Spring Camp 2013 MAC RxQ 127 MAC RxQ Getting started with a new project (4) – Shared modules included from netfpga/lib/verilog • Generic modules that are re-used in multiple projects • Specify shared modules in project’s mhs file – Local src modules override shared modules – crypto_nic: Local crypto.v Spring Camp 2013 Shared Everything else 128 Getting started with a new project (5) Create crypto.v from module_template.v: 1. 2. 3. Rename project project_template to crypto Rename the local module_template.v to crypto.v Change the module name inside crypto.v (first noncomment line of the file) 4. 5. 6. Add the crypto module to the mhs file Update mpd file Update pao file Spring Camp 2013 129 Running XPS cd ~/NetFPGA-10G-live/projects/crypto_nic xps hw/system.xmp Spring Camp 2013 130 Adding The Crypto Module • Double click the Crypto module in IP catalog – It’s under “Project Local Pcores” • Select 256 bit data width and click OK Spring Camp 2013 131 Connect Crytpo module • In “Bus Interfaces”, click Crypto’s S_AXIS and select Input Output_lookup’s M_AXIS • Click Output_queue’s S_AXIS and select Crypto’s M_AXIS Spring Camp 2013 132 Connect Clock and Reset • Click “Ports” Tab • Clock -> clock_generator_0::CLKOUT1 • Reset -> reset_0::Peripheral_aresetn Spring Camp 2013 133 Implementing the Crypto Module (1) • What do we want to encrypt? – IP payload only • Plaintext IP header allows routing • Content is hidden – Encrypt bytes 35 onward • Bytes 1-14 – Ethernet header • Bytes 15-34 – IPv4 header (assume no options) • Remember AXI byte ordering – For simplicity, assume all packets are IPv4 without options Spring Camp 2013 134 Implementing the Crypto Module (2) • State machine (draw on next page): – Module headers on each packet – Datapath 256-bits wide • 34 / 32 is not an integer! • Inside the crypto module Registers S_AXIS Spring Camp 2013 in_fifo_empty valid in_fifo_rd_en ready data/ctrl data/ctrl 135 M_AXIS Crypto Module State Diagram Hint: We suggest 3 states Detect Packet’s Header Spring Camp 2013 137 Implementing the Crypto Module (3) Implement your state machine inside crypto.v – Use a static key initially Suggested sequence of steps: 1. Create a static key value • 2. Constants can be declared in the module with localparam: localparam MY_EXAMPLE = 32’h01234567; Implement your state machine without modifying the packet Update your state machine to modify the packet by XORing the key and the payload 3. • Use eight copies of the key to create a 256-bit value to XOR with data words Spring Camp 2013 138 module_template.v (1) module module_template #( parameter C_FAMILY = "virtex5", parameter C_M_AXIS_DATA_WIDTH=256, parameter C_S_AXIS_DATA_WIDTH=256, parameter C_M_AXIS_TUSER_WIDTH=128, parameter C_S_AXIS_TUSER_WIDTH=128, ... ) ( Module ... ) port declaration //----------------------- Signals---------------------------... //------------------ Local assignments ----------------------... //------------------------Registers ... Spring Camp 2013 ---------------------- 139 module_template.v (2) //------------------------- Modules------------------------------fallthrough_small_fifo #( .WIDTH(C_S_AXIS_DATA_WIDTH+C_S_AXIS_TUSER_WIDTH +(C_S_AXIS_DATA_WIDTH / 8) +1), Packet data .MAX_DEPTH_BITS(2) ) input_fifo ( .din (), .wr_en (), .rd_en (), .dout (), .full (), .nearly_full .empty (), .reset (), .clk () ); dumped in a FIFO. Allows some “decoupling” between input and output. // Data in // Write enable // Read the next word // Data out //FIFO full indication (), //Almost full indication //FIFO empty indication //Rest //Clock Spring Camp 2013 140 module_template.v (3) // -- AXILITE IPIF axi_lite_ipif_1bar axi_lite_ipif_inst ( . . . ); Registers section. #(. . .) Ignore for now – we’ll explore this later // -- IPIF REGS ipif_regs #(. . .) ipif_regs_inst ( . . . ); //------------------------Registers logic ---------------------- Spring Camp 2013 141 module_template.v (4) //------------------------- Logic------------------------------- Combinational logic to read data from the FIFO. (Data is output to output ports.) always @(*) begin // Default values out_wr_int = 0; in_fifo_rd_en = 0; if (!in_fifo_empty && out_rdy) begin out_wr_int = 1; in_fifo_rd_en = 1; end end Spring Camp 2013 142 You’ll want to add your state in this section. More Verilog: Assignments 1 • Continuous assignments – appear outside processes (always @ blocks): assign foo = baz & bar; – targets must be declared as wires – always “happening” (ie, are concurrent) Spring Camp 2013 143 More Verilog: Assignments 2 • Non-blocking assignments – appear inside processes (always @ blocks) – use only in sequential (clocked) processes: always @(posedge clk) begin a <= b; b <= a; end – occur in next delta (‘moment’ in simulation time) – targets must be declared as regs – never clock any process other than with a clock! Spring Camp 2013 144 More Verilog: Assignments 3 • Blocking assignments – appear inside processes (always @ blocks) – use only in combinatorial processes: • (combinatorial processes are much like continuous assignments) always @(*) begin a = b; b = a; end – occur one after the other (as in sequential langs like C) – targets must be declared as regs – even though not a register – never use in sequential (clocked) processes! Spring Camp 2013 145 More Verilog: Assignments 3 • Blocking assignments – appear inside processes (always @ blocks) – use only in combinatorial processes: • (combinatorial processes are much like continuous assignments) always @(*) begin tmp = a; unlike non-blocking, a = b; have to use a temporary signal b = tmp; end – occur one after the other (as in sequential langs like C) – targets must be declared as regs – even though not a register – never use in sequential (clocked) processes! Spring Camp 2013 146 (hints) • Never assign one signal from two processes: always @(posedge clk) begin foo <= bar; end always @(posedge clk) begin foo <= quux; end Spring Camp 2013 147 (hints) • In combinatorial processes: – take great care to assign in all possible cases always @(*) begin if (cond) begin foo = bar; end end – (latches ‹as opposed to flip-flops› are bad for timing closure) Spring Camp 2013 148 (hints) • In combinatorial processes: – take great care to assign in all possible cases always @(*) begin if (cond) begin foo = bar; else foo = quux; end end Spring Camp 2013 149 (hints) • In combinatorial processes: – (or assign a default) always @(*) begin foo = quux; if (cond) begin foo = bar; end end Spring Camp 2013 150 Section VIII: Simulation and Debug Spring Camp 2013 151 Testing: Simulation • Simulation allows testing without requiring lengthy synthesis process • NetFPGA simulation environment allows: – Send/receive packets • Physical ports and CPU – Read/write registers – Verify results • Simulations run in iSim Spring Camp 2013 152 Testing: Simulation • We will simulate the “crypto_nic” design under the “10G simulation framework” • We will show you how to – create simple packets using scapy – transmit and reconcile packets sent over 10G Ethernet and PCIe interfaces Spring Camp 2013 153 Testing: Simulation(2) Run a simulation to verify changes: 1. 2. cd ~/NetFPGA-10G-live/projects/crypto_nic/hw make sim or 1. 2. cd ~/NetFPGA-10G-live/projects/crypto_nic/hw make simgui make simgui opens isim *more on this later* Now we can implement the crypto functionality Spring Camp 2013 154 1 cd ~/NetFPGA-10G-live/projects/crypto_nic/hw vim make_pkts.py • This will create eight simple IP packets for injecting into the system from each port including CPU Spring Camp 2013 155 2 The destination in this code snippet is the DMA • Using the created packets we create the stimulus (input to the simulator) and the expected output from the simulator • NB. line 97 deciphers the packets from the device • To test without a key – edit ~/NetFPGA-10Glive/projects/crypto_nic/hw/Makefile and set the argument of –k to be 0x0 Spring Camp 2013 156 3 The destinations in this code snippet are the ports • Using the created packets we create the stimulus (input to the simulator) and the expected output from the simulator • NB. line 97 deciphers the packets from the device • To test without a key – edit ~/NetFPGA-10Glive/projects/crypto_nic/hw/Makefile and set the argument of –k to be 0x0 Spring Camp 2013 157 4 The results are in… • As expected, two packets are received from PCIe on each 10G interface • Total of 32 packets, eight from each 10G interface, are received on the PCIe Spring Camp 2013 158 Running simulation in iSim (1) • To invoke iSim for design simulation, run: # cd <project_root>/hw/ # make simgui • This will open simulation in the iSim graphical user interface Spring Camp 2013 159 Running simulation in iSim (2) Objects panel Waveform window Instance panel Tcl console Spring Camp 2013 160 Running simulation in iSim (3) • Instance panel: displays process and instance hierarchy • Objects panel: displays simulation objects associated with the instance selected in the instance panel • Waveform window: displays wave configuration consisting of signals and busses • Tcl console: displays simulator generated messages and can executes Tcl commands Spring Camp 2013 161 Simulation gone wild When make sim 1 source /opt/Xilinx/13.4/ISE_DS/settings64.sh 2 … XPS% make: *** No rule to make target `___xps/simgen.opt’, needed ….. Go fix the address in ….hw/system.mhs 3 ERROR:EDK – for point-2-point interface connector ‘nf10_crypto_0_m_axis’….. 3a Add the crypto module to the MHS file (edit system.mhs OR use the GUI) 3b Check the connectivity of the module (To each of the OPL and Output Queue modules) 4 ….Cannot find MPD for weird core name …. 5 ERROR: Simluator:778 – Static elaboration of top level Verilog design unit(s) in library work failed edit pao file (~/NetFPGA-10G-live/projects/crypto_nic/hw/pcores/nf10_crypto_v1_00_a/data/nf10_crypto_v2_1_0.pao) For simulation uncomment the first four lines (39-42) comment out the last line here (43) For synthesis uncomment lines 39,40,43 comment out lines 41 and 42 Spring Camp 2013 162 Simple Verilog error Lots of errors When the file is empty… Spring Camp 2013 163 Section IX: Conclusion Spring Camp 2013 164 Acknowledgments NetFPGA Team at Stanford University (Past and Present): Nick McKeown, Glen Gibb, Jad Naous, David Erickson, G. Adam Covington, John W. Lockwood, Jianying Luo, Brandon Heller, Paul Hartke, Neda Beheshti, Sara Bolouki, James Zeng, Jonathan Ellithorpe, Sachidanandan Sambandan, Eric Lo NetFPGA Team at University of Cambridge (Past and Present): Andrew Moore, David Miller, Muhammad Shahbaz, Martin Zadnik Matthew Grosvenor, Gianni Antichi, Neelakandan Manihatty-Bojan, Georgina Kalogeridou, Jong Hun Han, Noa Zilberman All Community members (including but not limited to): Paul Rodman, Kumar Sanghvi, Wojciech A. Koszek, Yahsar Ganjali, Martin Labrecque, Jeff Shafer, Eric Keller , Tatsuya Yabe, Bilal Anwer, Yashar Ganjali, Martin Labrecque Kees Vissers, Michaela Blott, Shep Siegel Spring Camp 2013 165 Thanks to our Sponsors: • Support for the NetFPGA project has been provided by the following companies and institutions Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project. This effort is also sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249. This material is approved for public release, distribution unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Spring Camp 2013 167