Methods for Evaluation of Embedded Systems

advertisement
Methods for Evaluation of Embedded
Systems
Simon Künzli, Alex Maxiaguine
Institute TIK, ETH Zurich
System-Level Analysis
Web browsing
IP Telephony
Multimedia streaming
Secure FTP
LookUp
RISC
Cipher
DSP
Bus Load ?
Memory ?
Packet Delays ?
Resource Utilization ?
Clock Rate ?
Problems for Performance Estimation
• Distributed processing of
applications on different
resources
SDRAM
RISC
Arbiter
DSP
• Interaction of different
applications on different
resources
• Heterogeneity, HW-SW
A “nice-to-have” performance model
• measuring what we want
• high accuracy
• high speed
• full coverage
• based on unified formal specification model
• composability & parameterization
• reusable across different abstraction levels
 at least easy to refine
Overview over Existing Approaches
Ernst
speed
Thiele
Givargis
Lahiri
SPADE
Jerraya
accuracy
Benini
RTL
Discrete-event Simulation
Event Scheduler
• Event queue
actions to be
executed
Accuracy vs. Speed:
future events
(e.g. signal changes)
How many events are
simulated?
System Model
• Architecture and Behavior
• Components/Actors/Processes
• Communication
channels/Signals
© The MathWorks
Discrete-event Simulation
“The design space”:
 Time resolution
 Modeling communication
 Modeling timing of data-dependent execution
 …
cont.
time
x(t)
x(t)
a
a c
a
c
a
a
t1
t2 t3
t4
t5
t6
t7
a
a c
a
c
a
a
t1
t2 t3
t4
t5
t6
t7
t
discrete
time
• Continuous time
 e.g. Gate-level simulation
• Discrete time or “cycle-accurate”
 e.g. Register Transfer Level (RTL) simulation
 system-level performance analysis
t
accuracy
Time Resolution
Modeling communication
•
•
Pin-level model
 all signals are modeled explicitly
 often combined with RTL
C1
d2
d1
d0
C2
Transaction-level Model
 protocol details are abstracted
 e.g. burst mode transfers
C1
C2
ready
true/false
<write> transaction
• TLM simulator of AMBA bus x100 faster then pin-level model
Caldari et al. Transaction-Level Models for AMBA Bus Architecture Using
SystemC 2.0. DATE 2003
Modeling timing of data-dependent execution
Problem:
• How to model timing of datadependent functionality inside a
component?
in out
a=read(in)
Possible solution:
Estimate and annotate delays in
the functional/behavioral model:
a>b
d1
task1()
task2()
a=read(in);
if(a>b) {
task1();
delay(d1);
else {
task2();
delay(d2);}
write(out,c);
d2
write(out,c)
•
this approach works well for HW but
may be too coarse for modeling SW
HW/SW Cosimulation Options
Application SW...
• … is delay-annotated & natively executes on
workstation as a part of HW simulator
• … is compiled for target processor and its code is used
as a stimuli to processor model that is a part of HW
simulator
• … is not a part of the HW simulator -- a complete
separation of Application and Architecture models
Processor Models: Simulation Environment
RTL
C/C++
Application
SW
Compiler
.exe
prog.
code
Microarch.
Sim.
ISS
Processor
Model
wrapper
HW Sim. (rest of the system)
Processor Models
• RTL model
 cycle-accurate or continuous time
 all the details are modeled (e.g. synthesizable)
• Microarchitecture Simulator
 cycle-accurate model
 models pipeline effects, etc
 can be generated automatically
(e.g. Liberty, LISA…)
• Instruction Set Simulator
 provides instruction count
 functional models of instructions
e.g. SimpleScalar
Multiprocessor System Simulator
Cycle-accurate
ISS
SystemC
Wrapper
SystemC model
L Benini, U Bologna
Comparison of HW/SW
Co-simulation techniques
simulator
continuous time
(nano-second accurate)
cycle-accurate
instruction level
speed
(instructions/sec)
1 - 100
50 – 1000
2000 – 20,000
J. Rowson, Hardware/Software Co-Simulation, Proceedings of the 31st DAC, USA,1994
HW/SW Co-simulation Options
Application SW...
• … is delay-annotated & natively executes on
workstation as a part of HW simulator
• … is compiled for target processor and its code is used
as a stimuli to processor model that is a part of HW
simulator
• … is not a part of the HW simulator -- a complete
separation of Application and Architecture models
Independent Application and Architecture Models
(“Separation of Concerns”)
Application
WORKLOAD
Mapping
DSP
RISC
RESOURCES
SRAM
Architecture
Co-simulation of Application and Architecture
Models
Basic principle:
 Application (or functional) simulator drives architecture (or
hardware) simulator
 The models interact via traces of actions
 The traces are produced
on-line or off-line
Advantages:
 system-level view
 flexible choice of abstraction level
 the models and the mapping can be easily altered
Trace-driven Simulation
SPADE: System level Performance Analysis and Design
space Exploration
Architecture model
Application model
P. Lieverse et al., U Delft & Philips
Trace-driven Simulation (SPADE)
Lieverse et al., U Delft & Philips
Going away from discrete-event simulation…
Analysis for Communication Systems
Lahiri et al., UC San Diego
A two-step approach:
1. simulation without communication (e.g. using ISS)
2. analysis for different communication architectures
 K. Lahiri, UCSD
Overview
 K. Lahiri, UCSD
Analytical Methods for Power Estimation
• Givargis et al. UC Riverside
• Analytical models for power consumption of:
 Caches
 Buses
• two-step approach for fast power evaluation
 collect intermediate data using simulation
 use equations to rapidly predict power
 couple with a fast bus estimation approach
Approach Overview
 Givargis, UC Riverside
• Bus equation:
• m items/second (denotes the traffic N on the bus)
• n bits/item
• k bit wide bus
• bus-invert encoding
• random data assumption


 k 1 
 
 k 1

k 1








k 





2
  n  
 k 
 
P  Cbus m       1k  1   2k   2     k     
2
2
 2 
  k   2






Experiment Setup
 Givargis, UC Riverside
Performance
C
Program
Trace
Generator
ISS
Cache
Simulator
• Dinero [Edler, Hill]
• CPU power [Tiwari96]
CPU
Power
Memory
Power
Bus
Simulator
I/D Cache
Power
+
Analytical Method
Workload ?
e1
e3
?
e2
e4
?
scheduling
discipline 1
scheduling
discipline 2
CPU1
CPU2
Event Model Interface Classification
 Ernst, TU Braunschweig
burst
(b) = 1
T=T,length
t=T, b=1
periodic with burst
b
b
T
t
periodic
T
T
t=T
jitter
0
T=T,= J=0
periodic with jitter
T
T
J
J
J
t
t=t
lossless EMIF
sporadic
xt xt xt
t=T-J
EMIF to less expressive model
Example: EMIFs & EAFs
EAF
?
Use standard scheduling analysis for
e2
e4
EMIF
?
single components.
e1
EMIF
scheduling
discipline 1
CPU1
Event model
interface
needed
e3
scheduling
discipline
Event2
adaptation
CPU2
function
needed
General Framework
Functional Task Model
T1
load
scenarios
Abstract Task Model
T3
functional
units
abstract load
scenarios
Abstract Components
(Run-Time Environment)
T2
mapping
relations
event
streams
abstract event
streams
abstract functional
units
abstract resource
units
Abstract Architecture
resource units
Architecture Model
ARM9
DSP
Event & Resource Models
• use arrival curves to capture event streams
• use service curves to capture processing capacity
# of packets
max: 1
2 packet
3
packets
min: 0
1 packets
packet
au
al
3
2
1
DDD
time t
0 1
2
D
Analysis for a Single Component
l,u
a l ,a u
αl ,αu
 l  ,  u
Analysis – Bounds on Delay & Memory
service curve l
u,l
delay d
au,l
b
arrival curve au
backlog b
Comparison between diff. Approaches
Analytical Methods
Simulation-Based
• possibilities to answer
questions limited by
method
• restricted by
underlying models
• good coverage (worst
case)
• fast
• coarse
• can answer virtually
any questions about
performance
• can model arbitrary
complex systems
• average case (single
instance)
• time-consuming
• accurate
Example: IBM Network Processor
Comparison RTC vs. Simulation
90
80
Simulation
Analytical Method
60
40
30
20
PLBread
write
50
OP
B
PLB
Utilization [%]
70
10
0
100Mbps
150Mbps
200Mbps
250Mbps
Linespeed
300Mbps
350Mbps
400Mbps
Experiment Results
 Givargis, UC Riverside
Execution Time (sec)
•Diesel application’s performance
•Blue is obtained using full simulation
•Red is obtained using our equations
0.3
4% error
320x faster
0.25
0.2
0 . 15
0.1
0.05
0
Con f 0
Con f 1
Con f 2
Con f 3
Con f 4
Con f 5
Con f 6
Con f 7
Con f 8
Con f 9
Concluding Remarks
Backup
Metropolis Framework
Cadence Berkeley Lab & UC Berkeley
Download