Fast SoC Architecture Exploration Using Traffic Simulation Techniques Nadjib Mammeri, ARM 1 CONFIDENTIAL Problems we are trying to solve What interconnect topology should I use? What arbitration and QoS schemes? How should I configure my memory controller? DMC queue length? Memory width? How to optimally size my interconnect/memory system and still meet my performance requirements? 2 CONFIDENTIAL SoC Architecture Exploration Current Techniques Spreadsheet: Not accurate, Fast, Cheap RTL simulation: 100% Accurate, Slow, Expensive RTL emulation: Accurate, Fast, Expensive Behavioural SystemC models: Accurate, Fast, Expensive Traffic Profiling: ~Accurate, Fast, Cheap Abstracting away some components or parts of the system and replacing them with bus transactors that can: Generate realistic traffic which is statistically equivalent to SoC data flows Re-use existing data flows to explore new architectures Uses constrained random techniques 3 CONFIDENTIAL Our proposed approach Iteration time of a spreadsheet with the accuracy approaching RTL simulation minutes/hours Spreadsheet Analysis Mathematical formula, not dynamic minutes/hours RTL simulation, VPE, User VIP Industry standards VIP Statistical or recorded traffic profiles days/weeks Acceleration/ Emulation VIP, Logic Tiles, SW months/years Silicon/ Applications HIGH 4 CONFIDENTIAL Adding S/W, external I/F with realistic scenarios Observe actual behaviour LOW Realistic behaviour Cycle time LOW HIGH How is it done? When analysing performance, content or functional intent of the data is not important but the nature and flow of traffic is. Reduction in simulation time can be achieved by trading off functional accuracy of end points. Accuracy should be preserved in the DUT and in the interconnect because it is the performance bottleneck. How simulation speed-up is achieved By ‘giving-up’ execution of functions within the emulated device in 5 favour of emulating its traffic No need to model their cycle-accurate behaviour By replacing real data with constrained random data CONFIDENTIAL What is VPE (formerly AVIP) ? Functional Verification Complete AXI functional Verification solution System Verilog Master, Slave, Monitor RTL Protocol assertions RTL Coverage Points Performance Exploration 6 Profile editor toolkit GUI RTL Profile extraction RTL Profile generation AXI Traffic Characterization and Analysis AXI Traffic Replay and Adaptation CONFIDENTIAL IEEE 1800 SystemVerilog Testbench Profile Data AXI Master Customer VIP AXI Slave Interface User AXI Master Interface DUT AXI Slave AXI Monitor (Block or Sub-system) AXI Master AXI Slave Interface AXI Master Interface Customer IP Profile Data Abstraction example1 If I would like to investigate my interconnect topology, I would keep the RTL for my interconnect and abstract away all end points (masters and slaves). Replace them with VPE masters and slaves Master Master 1 Master2 Master 3 Master 4 Master Monitor Monitor AXI Interconnect Slave 1 Slave 2 Monitor Monitor CONFIDENTIAL Master Monitor AXI Interconnect Slave 7 Master Slave Abstraction example2 If I would like to investigate my memory controller configurability, I would use the RTL for my interconnect and DMC and abstract away other end points. Replace them with VPE masters and slaves Master Master 1 Master2 Master 3 Master 4 Master Monitor Monitor DMC Monitor Monitor Slave 8 CONFIDENTIAL Master Monitor AXI Interconnect AXI Interconnect Slave 1 Master DMC Traffic Profiling (1) Traffic profiles statistically characterise the traffic (transactions) on an AXI connection Traffic flow is an identifiable stream of traffic (AXI transactions) between two points in a system Examples: When profiling at slave 1, traffic coming from Master 2 can be identified using AxID 9 If we know Master 1 always does 4-beat bursts we can identify its traffic flow based on AxLEN CONFIDENTIAL Traffic Profiling (2) A profile is associated with a connection and can have multiple flows Flows contain histograms that store statistical data of both payload and timings information. Payload histograms Histograms describing traffic payload information (control of a transaction, response of a transaction but no data content) ADDRESS, ID, BURST, SIZE, LEN, RESP etc… Timing histograms Histograms describing traffic timings information ITT, AWW, WW, WIL, WBL, ARW, RW, RBL etc… 10 CONFIDENTIAL AXI Timing Histograms Inter transaction timings ITT: Histogram parameter defining the inter-transaction timings in a flow (time between successive transactions). Continuous traffic Bursty traffic Frequency 20 Frequency 20 10 10 0 0 10 20 30 itt 40 50 10 20 30 itt 40 50 Intra transaction timings 11 Flow timings: timings that describe the flow of traffic. Connection timings: timings that are considered as properties of the connection CONFIDENTIAL AXI Intra-Transaction Timings RIL: Time between handshake on the AR channel and the first read transfer on the R channel RW: Time between RVALID and RREADY WIL: Time between handshake on the AW channel and the first write transfer on the W channel WW: Time between WVALID and WREADY 12 CONFIDENTIAL How accurate is it? 4 hours to 4 minutes – VPE Master executing 2M cycles of traffic profile in place of real Mali200 RTL running Proxycon/Samurai content Real RTL VPE profile executes much faster than real RTL but VPE Profile generates represent able & controllable traffic 13 CONFIDENTIAL Original captured traffic profile now used to drive VPE Master More VPE Features Master Slave Monitor AXI Protocol checker AXI Protocol coverage 14 CONFIDENTIAL Transaction recording/ visualisation Traffic profile extraction Conclusion System architects requires novel techniques with short iteration times to analyze performance and fine tune their SoCs. VPE introduces a new approach that combines high level modeling and statistical low level random generation techniques to explore and verify IP performance. Traffic profiling can be used by VPE masters and slaves to generate statistically equivalent traffic and by VPE monitors when monitoring performance. 15 CONFIDENTIAL Questions 16 CONFIDENTIAL