RISC-V “Rocket Chip” SoC Generator in Chisel Yunsup Lee UC Berkeley yunsup@eecs.berkeley.edu What is the Rocket Chip SoC Generator? ! Parameterized SoC generator written in Chisel ! Generates Tiles - (Rocket) Core + Private Caches ! Generates Uncore (Outer Memory System) - Coherence Agent - Shared Caches - DMA Engines - Memory Controllers ! Glues all the pieces together 2 “Rocket Chip” SoC Generator Tile Rocket Core ROCC Accel. FPU Tile Rocket Core ROCC Accel. FPU L1 Inst L1 Data L1 Inst L1 Data sets, ways sets, ways sets, ways sets, ways L1 Network Coherence Manager HTIF ! Generates n Tiles - (Rocket) Core - RoCC Accelerator - L1 I$ - L1 D$ ! Generates HTIF - Host DMA Engine ! Generates Uncore - L1 Crossbar - Coherence Manager - Exports MemIO Interface TileLink / MemIO Converter 3 Why SoC Generators? ! Helps tune the design under different performance, power, area constraints, and diverse technology nodes ! Parameters include: - number of cores - instantiation of floating-point units, vector units - cache sizes, associativity, number of TLB entries, cache-coherence protocol - number of floating-point pipeline stages - width of off-chip I/O, and more 4 Why Chisel? ! RTL generator written in Chisel - HDL embedded in Scala ! Full power of Scala for writing generators - object-oriented programming - functional programming Chisel Program Scala/JVM C++ code FPGA Verilog ASIC Verilog C++ Compiler Software Simulator FPGA Tools ASIC Tools FPGA Emulation GDS Layout 5 Rocket Scalar Core PC PC Gen. IF ITLB I$ Access bypass paths omitted for simplicity ID Int.RF Inst. Decode EX Int.EX FP.RF MEM DTLB D$ Access FP.EX1 WB Commit To RoCC to Hwacha Accelerator FP.EX2 FP.EX3 Rocket Pipeline VITLB -PC64-bit 5-stage single-issue in-order pipeline VInst. SeqBank1 ... Bank8 Expand VI$ Design minimizes impact of long clock-to-output delays Gen. Decode uencer R/W R/W Access of compiler-generated RAMs from Rocket - 64-entry BTB, 256-entry BHT, 2-entry RAS Hwacha Pipeline - MMU supports page-based virtual memory - IEEE 754-2008-compliant FPU - Supports SP, DP FMA with hw support for subnormals 6 ARM Cortex-A5 vs. RISC-V Rocket Category ARM Cortex-A5 RISC-V Rocket ISA 32-bit ARM v7 64-bit RISC-V v2 Architecture Single-Issue In-Order Single-Issue In-Order 5-stage Performance 1.57 DMIPS/MHz 1.72 DMIPS/MHz Process TSMC 40GPLUS TSMC 40GPLUS Area w/o Caches 0.27 mm2 0.14 mm2 Area with 16K Caches 0.53 mm2 0.39 mm2 Area Efficiency 2.96 DMIPS/MHz/mm2 4.41 DMIPS/MHz/mm2 Frequency >1GHz >1GHz Dynamic Power <0.08 mW/MHz 0.034 mW/MHz - PPA reporting conditions - 85% utilization, use Dhrystone for benchmark, frequency/power at TT 0.9V 25C, all regular VT transistors - 10% higher in DMIPS/MHz, 49% more area-efficient 7 HTIF: Host-Target Interface ! UC Berkeley specific block mainly used to emulate devices for simple test chips - Emulates system calls, console, block devices, frame buffer, network devices - No need for this block once the SoC has actual devices on the target machine ! Consider it as a “host DMA engine” ! A port for for host system to read/write - Core CSRs (control and status registers) - Target Memory 8 Important Interfaces in the Rocket Chip Tile Rocket Core Tile Rocket Core ROCC Accel. ROCCIO FPU HTIF HTIFIO ROCC Accel. FPU HostIO L1 Inst L1 Data sets, ways sets, ways sets, ways sets, ways client client client client Tile L L1 NetworkO Lin Tile kIO Li Tile nkI client arb TileLink L1 Data inkI O L1 Inst Coherence Manager mngr mngr TileLink / MemIO Converter TileLink arb TileLinkIO client MemIO ! ROCCIO - Interface between Rocket/Accelerator ! HTIFIO - Read/Write CSRs ! TileLinkIO - Coherence Fabric ! MemIO - Simple AXI-like memory interface ! HostIO - Host Interface to HTIF 9 TileLinkIO Client Client Finish Grant Release Probe Acquire Cache Finish Grant Release Probe Acquire Cache Manager - TileLinkIO consists of Acquire, Probe, Release, Grant, Finish 10 UncachedTileLinkIO Client Client Finish Grant Acquire Finish Grant Release Probe Acquire Cache Manager - UncachedTileLinkIO consists of Acquire, Grant, Finish - Convertors for TileLinkIO/UncachedTileLinkIO in uncore library 11 MemIO Master MemReqCmd Slave MemReqCmd.valid MemReqCmd.ready Decoupled(MemData) Decoupled(MemResp) - MemReqCmd consists of addr, rw (write=true), tag - MemData consists of 128 bit data payload - MemResp consists of 128 bit data payload, tag - Decoupled(interface) means an interface with ready/ valid signals 12 ROCCIO ! Rocket sends Rocket Decoupled(Cmd) Decoupled(Resp) ROCC Accel. ! CacheIO busy IRQ ! supervisor bit ! ! UncachedTileLinkIO ! PTWIO exception ! coprocessor instruction via the Cmd interface Accelerator responds through Resp interface Accelerator sends memory requests to L1D$ via CacheIO busy bit for fences IRQ, S, exception bit used for virtualization UncachedTileLinkIO for instruction cache on accelerator PTWIO for page-table walker ports on accelerator 13 HTIFIO HTIF reset Tile core_id Decoupled(CSRReq) Decoupled(CSRResp) Decoupled(IPIReq) Decoupled(IPIResp) - reset signal and core_id routed from HTIF (historical reasons nothing technical) - CSR Read/Write requests go through CSRReq/CSRResp - IPI Requests go through IPIReq/IPIResp - HTIFIO likely to be modified in the near future 14 Rocket Chip C++ Emulator Setup Verilog Simulator pthread Rocket Chip MemIO pthread HostIO RISC-V Frontend Server DRAMSim2 15 Rocket Chip FPGA Setup ZYNQ FPGA Rocket Chip HostIO MemIO HostIO/AXI Convertor MemIO/AXIHP Convertor AXI HP AXI AXI Master AXI HP Slave ARM Processing System RISC-V Frontend Server DDR3 DRAM 16 Rocket Chip Berkeley Test Chip Setup Test Chip MemIO Rocket Chip MemIO Serializer M HostIO em Se ri a lIO ZYNQ FPGA HostIO/AXI Convertor MemIO MemIO Deserializer AXI HP AXI AXI Master MemIO/AXIHP Convertor AXI HP Slave ARM Processing System DDR3 DRAM RISC-V Frontend Server 17 Rocket Chip “SoC” Setup Interrupts Rocket Chip TiileLinkIO Devices Uncached TiileLinkIO MemIO LPDDR3 Memory Controller Me mIO LPDDR3 DRAM 18 Who should use the Rocket Chip Generator? People who would like to develop … Tile Rocket Core Tile Rocket Core ROCC Accel. ROCCIO FPU HTIF HTIFIO ROCC Accel. FPU HostIO L1 Inst L1 Data sets, ways sets, ways sets, ways sets, ways client client client client Tile L L1 NetworkO Lin Tile kIO Li Tile nkI client arb TileLink L1 Data inkI O L1 Inst Coherence Manager mngr mngr TileLink / MemIO Converter TileLink arb TileLinkIO client ! A RISC-V SoC - Look into Chisel parameters ! New Accelerators - Drop in at ROCCIO level ! Own RISC-V Core - Drop in at TileLinkIO level or MemIO level ! Own Device - Drop in at TileLinkIO or UncachedTileLinkIO MemIO 19 New Features: L2$ with Directory Bits Tile ROCC Accel. ROCC Accel. FPU L1 Inst L1 Data L1 Inst L1 Data sets, ways sets, ways sets, ways sets, ways client client client client L1 Network L2Cache client arb ManagerL2Cache L2CacheCoherence L2Cache L2Cache mngr mngr mngr mngr mngr sets, ways sets, ways sets, ways sets, ways sets, ways client client client client client mngr ! Shared L2$ with TileLink / MemIO Converter TileLink FPU Rocket Core HTIF multiple banks ! Each L2$ will act as a coherence manager with directory bits (snoop filter) ! These caches can be composed to build outer-level caches such as an L3$ TileLink Rocket Core Tile 20 New Features: ROCC interfaces with L2$ Tile ROCC Accel. FPU L1 Inst L1 Data L1 Inst L1 Data sets, ways sets, ways sets, ways client client client ROCC Accel. client L1 Network L2Cache to the L2$ to address more data Rocket Core client arb ManagerL2Cache L2CacheCoherence L2Cache L2Cache mngr mngr mngr mngr mngr sets, ways sets, ways sets, ways sets, ways sets, ways client client client client client mngr TileLink / MemIO Converter TileLink FPU ! ROCC talks directly HTIF TileLink Rocket Core Tile 21 New Features on the Deck ! Dual-issue Rocket Core ! Hwacha Vector Unit (checkout hwacha.org) ! Dump MemIO and use AXI 22