Fast SoC Architecture Exploration Using Traffic Simulation Techniques

advertisement
Fast SoC Architecture Exploration
Using Traffic Simulation
Techniques
Nadjib Mammeri, ARM
1
CONFIDENTIAL
Problems we are trying to solve
 What interconnect topology should I
use? What arbitration and QoS
schemes?
 How should I configure my memory
controller? DMC queue length?
Memory width?
 How to optimally size my
interconnect/memory system and still
meet my performance requirements?
2
CONFIDENTIAL
SoC Architecture Exploration
 Current Techniques




Spreadsheet: Not accurate, Fast, Cheap
RTL simulation: 100% Accurate, Slow, Expensive
RTL emulation: Accurate, Fast, Expensive
Behavioural SystemC models: Accurate, Fast, Expensive
 Traffic Profiling: ~Accurate, Fast, Cheap
 Abstracting away some components or parts of the system and
replacing them with bus transactors that can:
 Generate realistic traffic which is statistically equivalent to SoC data flows
 Re-use existing data flows to explore new architectures
 Uses constrained random techniques
3
CONFIDENTIAL
Our proposed approach
 Iteration time of a spreadsheet with the accuracy approaching RTL simulation
minutes/hours
Spreadsheet
Analysis
Mathematical
formula, not
dynamic
minutes/hours
RTL simulation,
VPE, User VIP
Industry standards
VIP
Statistical or
recorded
traffic profiles
days/weeks
Acceleration/
Emulation
VIP, Logic Tiles, SW
months/years
Silicon/
Applications
HIGH
4
CONFIDENTIAL
Adding S/W,
external I/F with
realistic
scenarios
Observe
actual
behaviour
LOW
Realistic behaviour
Cycle time
LOW
HIGH
How is it done?
 When analysing performance, content or functional intent of the data
is not important but the nature and flow of traffic is.
 Reduction in simulation time can be achieved by trading off functional

accuracy of end points.
Accuracy should be preserved in the DUT and in the interconnect
because it is the performance bottleneck.
 How simulation speed-up is achieved
 By ‘giving-up’ execution of functions within the emulated device in

5
favour of emulating its traffic
 No need to model their cycle-accurate behaviour
By replacing real data with constrained random data
CONFIDENTIAL
What is VPE (formerly AVIP) ?

Functional Verification
 Complete AXI functional Verification solution
 System Verilog Master, Slave, Monitor
 RTL Protocol assertions
 RTL Coverage Points
 Performance Exploration





6
Profile editor toolkit GUI
RTL Profile extraction
RTL Profile generation
AXI Traffic Characterization and Analysis
AXI Traffic Replay and Adaptation
CONFIDENTIAL
IEEE 1800 SystemVerilog Testbench
Profile
Data
AXI Master
Customer
VIP
AXI Slave
Interface
User
AXI Master
Interface
DUT
AXI Slave
AXI Monitor
(Block or Sub-system)
AXI Master
AXI Slave
Interface
AXI Master
Interface
Customer
IP
Profile
Data
Abstraction example1
 If I would like to investigate my interconnect topology, I would keep the

RTL for my interconnect and abstract away all end points (masters and
slaves).
Replace them with VPE masters and slaves
Master
Master 1
Master2
Master 3
Master 4
Master
Monitor
Monitor
AXI Interconnect
Slave 1
Slave 2
Monitor
Monitor
CONFIDENTIAL
Master
Monitor
AXI Interconnect
Slave
7
Master
Slave
Abstraction example2
 If I would like to investigate my memory controller configurability, I would

use the RTL for my interconnect and DMC and abstract away other end
points.
Replace them with VPE masters and slaves
Master
Master 1
Master2
Master 3
Master 4
Master
Monitor
Monitor
DMC
Monitor
Monitor
Slave
8
CONFIDENTIAL
Master
Monitor
AXI Interconnect
AXI Interconnect
Slave 1
Master
DMC
Traffic Profiling (1)
 Traffic profiles statistically


characterise the traffic
(transactions) on an AXI
connection
Traffic flow is an identifiable
stream of traffic (AXI
transactions) between two
points in a system
Examples:
 When profiling at slave 1, traffic
coming from Master 2 can be
identified using AxID

9
If we know Master 1 always
does 4-beat bursts we can
identify its traffic flow based on
AxLEN
CONFIDENTIAL
Traffic Profiling (2)
 A profile is associated with a connection and can have
multiple flows
 Flows contain histograms that store statistical data of both
payload and timings information.
 Payload histograms
 Histograms describing traffic payload information (control of a

transaction, response of a transaction but no data content)
ADDRESS, ID, BURST, SIZE, LEN, RESP etc…
 Timing histograms
 Histograms describing traffic timings information
 ITT, AWW, WW, WIL, WBL, ARW, RW, RBL etc…
10
CONFIDENTIAL
AXI Timing Histograms
 Inter transaction timings

ITT: Histogram parameter defining the inter-transaction timings in a flow (time
between successive transactions).
Continuous traffic
Bursty traffic
Frequency
20
Frequency
20
10
10
0
0
10
20
30
itt
40
50
10
20
30
itt
40
50
 Intra transaction timings


11
Flow timings: timings that describe the flow of traffic.
Connection timings: timings that are considered as properties of the connection
CONFIDENTIAL
AXI Intra-Transaction Timings
 RIL: Time between handshake

on the AR channel and the first
read transfer on the R channel
RW: Time between RVALID and
RREADY
 WIL: Time between handshake

on the AW channel and the first
write transfer on the W channel
WW: Time between WVALID
and WREADY
12
CONFIDENTIAL
How accurate is it?

4 hours to 4 minutes – VPE Master executing 2M cycles of traffic profile in
place of real Mali200 RTL running Proxycon/Samurai content
Real RTL
VPE profile
executes much
faster than real
RTL but
VPE
Profile
generates
represent able &
controllable traffic
13
CONFIDENTIAL
Original captured traffic
profile now used to drive
VPE Master
More VPE Features
Master
Slave
Monitor
AXI
Protocol
checker
AXI
Protocol
coverage
14
CONFIDENTIAL
Transaction
recording/
visualisation
Traffic
profile
extraction
Conclusion
 System architects requires novel techniques with short iteration times to
analyze performance and fine tune their SoCs.
 VPE introduces a new approach that combines high level modeling and
statistical low level random generation techniques to explore and verify
IP performance.
 Traffic profiling can be used by VPE masters and slaves to generate
statistically equivalent traffic and by VPE monitors when monitoring
performance.
15
CONFIDENTIAL
Questions
16
CONFIDENTIAL
Download