slides

advertisement
SystemC/TLM virtual platforms
Use of SystemC/TLM virtual platforms for
the exploration, the specification and the
validation of critical embedded SoC
A.BERJAOUI (AKKA IS for Astrium)
A.LEFEVRE & C. LE LANN (Astrium)
Workshop - November 2011 - Toulouse
Overview
 Context
 Separation of time & functionality presentation
 Timed TLM models Vs CABA models
 Design Space Exploration with SystemC/TLM 2.0
 HW in the loop – Use of CHIPit®
 Future prospects
 Open questions
2
Context
 Define a proper method to use SystemC/TLM for
SoC modelling
 Use SystemC/TLM for DSE (performance estimation,
bottleneck identification…)
 Use SystemC/TLM models for HW specification
 Evaluate the selected methodology
SystemC/TLM Usage Context
 Define a proper method to use SystemC/TLM for
SoC modelling
 Use SystemC/TLM for DSE (performance estimation,
bottleneck identification…)
 Use SystemC/TLM models for HW specification
 Evaluate the selected methodology
Programmer’s View (PV) or
functional simulation
 Time is not represented, only functionality is
modelled.
 Functional synchronization is necessary. It is done at
System Synchronization Points (SSP): configuration
registers access, interrupts and all state alternating
accesses.
The need for time
 Performance measurements
 Design Space Exploration
…how ???
 Precision?
 Modelling granularity?
 Simulation performance?
The obvious solution: mixing
time and functionality
 It works !!!
…but…
 Functional modifications cannot be verified without
having to verify all timed aspects as well
 Modelling granularity is hard to modify once it has
been set
 Modules cannot be easily reused for other platforms
Separation of time & functionality
Memory PVT
ISS PVT
ISS
ISS
PV
PV
Router
router
ISS
T
Detailed bus
model
Initiator port
Target port
8
Memory
PV
Memory
Memory
T
Timed simulation phase
Functional simulation phase
Memory PVT
ISS PVT
ISS
PV
PV router
ISS
T
Detailed bus
model
T= 12
0 ns
1
2
3
4
5
6
7
8
9
10
11
ns
Initiator port
Target port
9
Memory
PV
Memory
T
Advantages and limitations
PV & T mixed
PV & T separated
 Modelling is “natural”.
 Parallel development
Platforms are simple.
 Interrupts can be
modelled easily
and debug of reusable
PV and T models
 Granularity can be
controlled easily (by
changing T model)
 Granularity is fixed
 Modelling is more
 Mixed debugging and
no control over
simulation performance
 Reuse problem
abstract. Platforms are
complex
 Interrupts are harder to
model
TTP in the industry
In its current form, TTP cannot be used on an
industrial scale:
 Modelling is too complex to be used by architects
 Modules are not re-used enough to justify such a
modelling effort
 Traffic generators are enough for DSE. Detailed
functionality does not need to be specified for
performance estimation.
 HW specification is easier using cycle
“approximate”/bit accurate models
Timed TLM vs CABA models
 Different time modelling granularities:
 CABA in HDL => available, but slow simulations
 CABA in SystemC => not interesting (not available and
slow simulations)
 Timed TLM (SystemC AT) => preferred
 A timed TLM model of an existing RTL IP has been build
to evaluate the methodology and assess the necessary
effort
 RTL IP chosen = SDRAM memory controller, because:
 this is a central module in SoC architecture explorations
 its timing behaviour is harder to determine than other
modules’ (AHB buses for example)
SDRAM Memory Controller
 The Memory Controller is the
interface between the SoC bus
and the external (on-board)
memories
AHB
 One access latency depends on:
 the access parameters
 the controller internal state
MCTL
SDRAM
SRAM
 Objective for the timed model :
 the model should be
pessimistic=longer than the RTL
 +0 to +20 % timing accuracy
EEPROM
Time analysis methodology
 RTL analysis
 RTL is composed of intricate cyclebased State Machines
 Requires manual extraction of
timing rules
 May need to duplicate the RTL
FSM in the TLM model
 Not interesting
 Macroscopic analysis
 Using RTL simulations to produce
timing information
 Either guided

statistics choice
 Or semi-automated

using scripts
 Elected method
PWDOWN
IDLE
ACTV
RMW_RSENCODE
READ_RMW,
READ,
READ_SCRUB
WRITE
WRITE_SCRUB
ALL_PRE(latepre)
ALL_PRE
EARLYPRE
SEARLYPRE
Macroscopic time analysis
 Guided time analysis
 Timing data is extracted from RTL simulations
(traces of all the timings + relevant parameters)
 Rules are guessed by manually analyzing the traces…
 …and then automatically tested against a calibration test set
 This process iterates until the timing accuracy is satisfactory
 Results of the time analysis iterations
 The parameters of the previous access also have a major
impact (in addition to the parameters of the current access)
 Some features interfere (refresh and automatic scrubbing)
Timed Model Validation
 This timing model has been
checked against RTL on an
extensive test set
 more than 86000 transactions
 comes from the RTL validation
test suite
Frequency
Mistimed
transactions
Latency
error
32 MHz
18%
12%
48 MHz
14%
17%
64 MHz
14%
18%
96 MHz
17%
17%
 Validation results
 The model is pessimistic (longer than the RTL)
 Latency error between 12%-18%
 The model is too simple to be 100% exact
 But the goal is to keep a high level of abstraction
 Possibility to increase the accuracy if necessary
Overview
 Context
 Separation of time & functionality presentation
 Timed TLM models vs. CABA models
 Design Space Exploration with SystemC/TLM 2.0
 HW in the loop – Use of CHIPit®
 Future prospects
 Open questions
17
Design Space Exploration with
SystemC/TLM 2.0
 A simple image processing platform has been
designed to assess the use of SystemC/TLM for
design space exploration
Algorithm
 Image spectral-compression
platform
 Performs “subsampling” on
incoming data packets
Input
10N
Subsampling
5N
2D-FFT
 Subsampled packets are
then transferred to an
auxiliary processing unit
which performs a 2D-FFT
(using a co-processor) and
data encoding
5N
Encoding
N
Output
Processing platform
Leon_a
DMA_a
Mem_a
IO
DMA_b
Leon_b
Mem_b
FFT
Processing platform (cont’d)
 IO module generates an interrupt causing DMA_a to
transfer the input packet of size 10N to Mem_a
 At the end of the transfer, Leon_a subsamples the
data and writes the result to Mem_a
 Leon_a configures DMA_b to transfer the result to
Mem_b
 At the end of the transfer, Leon_b configures the
FFT module to perform a 2D-FFT
 Leon_b encodes the result and programs DMA_b to
send the result to the IO module
SystemC implementation
 TLM-2 compliant (time & functionality are mixed)
 Data exchange is AMBA – bus accurate
(single/burst transactions, split)
 Data sizes are respected and packets are
identified by a packet ID.
 The Leon processor modules act as “smart” traffic
generators: they generate transactions in the
correct order towards the appropriate targets.
 OS tasks are simulated using SC_THREADs
SystemC implementation (cont’d)
 No actual processing is performed. Processing time
is simulated
 Bus occupation, processing loads for all processing
units were measured accurately
 A system synchronization bug was identified => a
“lock” register has been added to lock DMA_b during
its configuration
 It was possible to observe the impact of the
modification of HW parameters and the input data
rate. DMA_a was identified as a bottleneck.
 ABV could also be implemented using ISIS
Example
Overview
 Context
 Separation of time & functionality presentation
 Timed TLM models vs. CABA models
 Design Space Exploration with SystemC/TLM 2.0
 HW in the loop – Use of CHIPit®
 Future prospects
 Open questions
25
HW in the loop – use of CHIPit
 CHIPit
 Virtex-based development platform
 Custom extension boards (SDRAM, Flash, IO, …)
 UMRBus = practical & fast PC-CHIPit ready-made interface
HW in the loop – use of CHIPit
 CHIPit can be used for :
 Incremental validation flow




SC/TLM testbench composed of multiple sub-blocks
Some sub-blocks may run on hardware (FPGA)
The others still run as software SC functional models
Soft-hard inter-block transactions via UMRBus + extra
SystemC/VHDL
CHIPit
soft
hard
hard
soft
soft
soft
soft
HW in the loop – use of CHIPit
 What happens on a transaction ?
 Uncontrolled clock mode



HW clock keeps working during a transaction
SW clock and HW clock are not synchronised
Easy to implement
 Controlled clock mode





HW clock is stopped upon each transaction, waiting for soft
SW clock and HW clock are synchronised on transaction bounds
Needed if inputs/outputs must observe precise relative timings
Harder to implement, more timing issues
Not possible for all designs : complex designs require extra care
 SDRAM controller needs constant auto-refresh
 Inputs from extension boards may need immediate treatment
HW in the loop – use of CHIPit
 Uncontrolled clock example : whole system overview
 Electronic board with inputs/outputs to other electronic systems
 SDRAM for internal data storage
 ASIC/FPGA for data processing
OBC
Electronic board
Periph 1
Periph 2
Instrument 1
Storage
Input 1
Output 1
ASIC
Input 2
Output 2
Instrument 2
RF comm
SDRAM memory
HW in the loop – use of CHIPit
 Uncontrolled clock example : ASIC internal view
 Data processing composed of several sub-blocks
 Sub-blocks perform independent tasks
 Sequenced altogether with very few signals (eg. req/ack)
OBC
ASIC
Sequencer
req/ack
Processing 1
FIFO
req/ack
Processing 2
FIFO
Core
FIFO
Processing 4
Input 1
Output 1
RX
Input 2
TX
Memory controller
SDRAM memory
Output 2
HW in the loop – use of CHIPit
 Uncontrolled clock example : ASIC re-modelling for HW
 Sequencer control signals re-modelled as APB transactions
 Inter-block FIFOs splitted (FIFO->SDRAM and SDRAM->FIFO)
 FIFOs mapped on AHB buses at fixed addresses
 Added DMAs to handle pipeline inputs and outputs from/to memory
 DMA channels can perform any AHB transfer (eg. SDRAM<->FIFO)
ASIC
Sequencer
APB
Input 1
Core
FIFO
FIFO
Processing 4
FIFO
FIFO
FIFO
Processing 2
FIFO
FIFO
Processing 1
FIFO
DMAs
AHBs
Output 1
RX
Memory controller
Input 2
TX
Output 2
SDRAM memory
HW in the loop – use of CHIPit
 Uncontrolled clock example : ASIC re-modelling for SC
 Use of TLM2 transactions between blocks
 SDRAM+controller merged into a memory abstraction model
 SDRAM access ports re-modelled as AHB buses
ASIC SystemC model
Sequencer
AHB bus(es) model
Memory model
Processing 4
FIFO
FIFO
DMA
FIFO
Core
DMA
FIFO
Processing 2
FIFO
DMA
FIFO
Processing 1
FIFO
DMA
FIFO
RX
DMA
TX
HW in the loop – use of CHIPit
 Benefits
 Same C file used for both Gaut VHDL generation and SystemC full-soft
emulation
► intrinsic algorithm consistency between model and hardware
 Few steps necessary from Gaut regeneration to FPGA synthesis and
SC model compilation, scriptable for process automation
► handy for fast algorithm exploration
 Outcome: SystemC model executable, allowing choice at runtime
between full-soft functional model and soft+hard co-simulation
$> scmodel SIMU
input.bin output_simu.bin > log_simu.txt
$> scmodel CHIPit input.bin output_hard.bin > log_hard.txt
$> diff output_simu.bin output_hard.bin
$>
HW in the loop – use of CHIPit
 Limitations
 Still have to develop SystemC+VHDL for each new transactor


Limits whole process automation
Encourages the use of common transactor types (AMBA, etc)
 Controlled clock mode much more complex to implement


Encourages the design of independent blocks, inter-connected via a
few FIFOs or via a common memory
Blocks with strong timing requirements on IO hardly compatible with
uncontrolled clock mode (better design with intelligent IO behaviour
: req+ack, handshake, etc)
 Implementation limited to actual CHIPit resources


SDRAM bus width is static (cannot test larger bus than available)
Custom extension boards required as early as algorithm exploration
HW in the loop – use of CHIPit
 SceMi : the wanna-be standard for co-simulation
 Formerly proposed by Cadence, now transferred to Accelera
 Defines a C++ API for HW-SW co-simulation




Controlled clock / uncontrolled clock modes
Function-based interface
Pipe-based interface (C++ stream = hardware FIFO)
Multi-threaded operation on software side
 CHIPit SceMi library available



Needs a supplementary licence
Just a wrapper over UMRBus libraries to provide clock control
All transactors still need to be coded by hand (SystemC+VHDL)
► still a lot of work to do before getting co-simulation working
Overview
 Context
 Separation of time & functionality presentation
 Timed TLM models vs. CABA models
 Design Space Exploration with SystemC/TLM 2.0
 HW in the loop – Use of CHIPit®
 Future prospects
 Open questions
36
Space industry applicability
 SystemC/TLM is suitable for DSE with the use of HLS
 Specification flow needs to be sorted out
Future prospects
 Important need in development infrastructure:
 Abstraction layer (architects are not TLM2 experts)
 Interrupts and streaming modelling (TLM is currently a
memory mapped platform oriented protocol)
 Build and assembly tools are needed
 Well defined modelling guidelines should be established
Thank you
?
Any questions ?
Workshop - November 2011
Open questions
 Who does the modelling? System, HW or SW
architect?
 SW validation uses paper specs => Towards
validation using HW based models in
SystemC/TLM?
 Towards a TLM3 standard? With embedded systems
industrial partners such as Airbus and Astrium?
(Business model?)
Download