presentation

advertisement
TIME-PREDICTABLE EXECUTION
OF EMBEDDED SOFTWARE ON
MULTI-CORE PLATFORMS
1
Sudipta Chattopadhyay
under the guidance of
A/P Abhik Roychoudhury
EMBEDDED SYSTEMS
2
REAL-TIME CONSTRAINTS
Hard real-time
Embedded
system
Soft real-time
3
TIMING ANALYSIS

Hard real time systems require absolute timing
guarantees
System level analysis
 Single task analysis


Worst case execution time (WCET) analysis
An upper bound on execution time for all possible inputs
 Sound over-approximation is obtained by static analysis

4
WCET ANALYSIS
WCET
of basic
blocks
Infeasible
path
constraints
Program
Control
flow
graph
Micro-architectural
modeling
WCET
boun
d
Loop
bound
constraints
Path analysis
5
ARCHITECTURE
Core 1
Core n
L1 cache
L1 cache
Shared bus
Resource
sharing
Shared L2 cache
Memory
6
OVERVIEW
Instr.
accesses
Core
1
Shared
cache
L1 instruction
Unified cache cache
Data
Shared cache
accesses
A multi-core
Core
n
+
WCET tool
L1 databus
cache
shared
L1 cache
L1 cache
L2 unified
Dissertation
Sharedcache
bus work
(Time-predictable execution in multi-core)
Shared L2 cache
Conflicts with different
instruction and data
Shared
scratchpad
memory
blocks
allocation
Memory
Main Memory
Coherence miss
modeling
Processor
Resource
sharing
Bus
Cache related
preemption delay
analysis
7
MICRO-ARCHITECTURAL MODELING
branch
predictor
cache
shared cache
pipeline
Single Core
shared bus
Multi Core
8
COMPARISON
Work
Micro-arch.
level
technique
Program level
technique
Precision
Scalability
Classical abstract
interpretation (AI)
AI
AI
×
√
Classical model
checking (MC)
MC
MC
√
×
RTS’00
(aiT, Chronos)
AI
Integer linear
programming
Can be
improved
√
RTSS’10
AI
MC
Can be
improved
_
Our approach
(AI+MC)
Integer linear
programming
> RTS’00
= RTS’00
(AI+MC)
MC
> RTSS’10
= RTSS’10
9
IMPRECISION IN ABSTRACT
INTERPRETATION
p1
young
p2
a
b
b
Cache state =
Abstract
C1
cache set
Path p1 or path p2?
young
x
Cache state =
C2
Abstract
Joined Cache state = C3
cache set
b
Joined cache state
Joined cache state loses information about path p1 and p2
10
MODEL CHECKING ALONE ?

A path sensitive search
Path sensitive search is expensive – path explosion
 Worse, combined with possible cache states

p1
Cache state =
C1
p2
Cache state =
C2
11
MODEL CHECKING ALONE ?

A path-sensitive search
Path sensitive search is expensive – path explosion
 Worse, combined with possible cache states
Abstract LRU

cache set
p1
young
p2
a
b
b
x
young
a
b
Abstract LRU
State Explosion
cache set
b
young
young
x
Abstract LRU
cache set
12
CACHE ANALYSIS
Program
Cache
analysis by
abstract
interpretatio
n
WCET
of basic
blocks
All checked
Pipeline
Analysis
analysis
outcome
Infeasible
path
constraints
IPET
Refine by
Branch predictor
model checker
modeling
Loop
bound
Timeout
Micro architectural
constraints
modeling
Refinement by model checker can be terminated at any point
Model checker refinement steps are inherently parallel
Path analysis
Each model checker refinement step checks light assertion property
13
REFINEMENT (INTER-CORE)
m
start
Conflictin
g task
Task
m1
m2
x<y
m1
x == y
Infeasible
m2
young
≠m
≠m
m
cache
m
Cache hit
Cache miss
exit
Spurious
14
REFINEMENT (INTER-CORE)
start
m
Conflictin
g task
Task
m1
x<y
C_m++
Increment
conflict
m1
Verified
m2
young
m
m
cache
x == y
C_m++
Increment
conflict
m
A Cache Hit
exit
Infeasible
assert (C_m <= 1)
m2
15
REFINEMENT (WHY IT WORKS?)
m
C_m++
m’
Increment
conflict
Conflict to m
Path 2
x<y
m’
x == y
m
Does not
affect the
value of C_m
assert (C_m <= 0)
m
Cache miss
Property
16
EXPERIMENTAL SETUP (CHRONOS
TOOLKIT)
GCC
simplescalar
C source
Micro
architectural
modeling
cache
pipeline
Binary code
CFG
Flow
constraints
Branch
prediction
ILP
WCET
CBMC
C bounded
model checking
Micro-architectural
constraints
17
EXPERIMENTAL RESULT
18
EXPERIMENTAL RESULT
Tasks
cnt
jfdctint
edn
fir
fdct
ndes
Direct-mapped, 256 bytes L1 cache
4-way associative, 8 KB
L1 cache
Shared L2
cache
WCET
Average time = 70 secs
19
EXTENSION USING SYMBOLIC EXECUTION
x<y
Conflictin
g task
m1
m2
x<y
C_m++
Increment
conflict
m1
unknown
x≥y
x<y
x=y
x=y
NO
x == y
constraint
solver
C_m++
Increment
conflict
m2
x<y˄x=y
satisfied
assert (C_m <= 1)
assert (C_m <= 1)
abort
20
EXTENSION USING KLEE
C source
GCC
simplescalar
Micro
architectural
modeling
cache
pipeline
Binary code
CFG
Flow
constraints
Branch
prediction
ILP
WCET
CBMC/KLEE
Micro-architectural
constraints
21
A GENERIC FRAMEWORK

Three different architectural/application settings
Cache
conflict
High
priority
cache
Intra task
(WCET in single core)
Low
Task in
Core 1
Cache priority
conflict
cache
Inter task
(Cache Related
Preemption Delay
analysis)
L1 cache
Cache
conflict
Task in
Core 2
L1 cache
Shared L2
cache
Inter core
(WCET in multi-core)
22
MICRO-ARCHITECTURAL MODELING
branch
predictor
cache
shared cache
pipeline
Single Core
shared bus
Multi Core
23
TASK-LEVEL INTERFERENCE
T1
Tasks
T3
T2
Core 1
Core n
L1 cache
L1 cache
Shared bus
Shared L2 cache
T2
T1
Timeline
T2
T3
T1
T3
Task interference graph
24
SHARED CACHE + TDMA SHARED BUS
Task graphs
T1
Time Division Multiple Access (TDMA)
T3
T2
T4
Core 1
Core 2
L1 cache
Core 1
slot
T1
Core 2
slot
L1 cache
Shared bus
T2
Shared L2 cache
T1
T2
T3
Bus
L2 access
miss
due
T4to
T2
Disjoint
lifetime
Core 1
slot
Bus access
Core 2
slot
T4
25
T3
T4
OVERVIEW OF THE FRAMEWORK
L1 cache
analysis
L1 cache
analysis
Filter
Filter
L2 cache
analysis
Initial interference
Task
interference
monotonically
decreases
L2 cache
analysis
Bus aware
analysis
L2 conflict
analysis
WCRT
computation
Yes
Interference
changes ?
Estimated
WCRT
No
26
EVALUATION (2-CORE)
One core runs statemate another core runs the program under
evaluation
27
EVALUATION (4-CORE)
Either runs (edn, adpcm, compress, statemate) or runs (matmult, fir,
jfdcint, statemate) in 4 different cores
28
MICRO-ARCHITECTURAL MODELING
branch
predictor
shared cache
Interactions
cache
pipeline
Single Core
shared bus
Multi Core
29
TIMING ANOMALY (SHARED CACHE)
hit
hit
hit
hit
miss
miss
miss
hit
miss
miss
hit
hit
miss
hit
miss
miss
May not be the worst case path
30
BASELINE ABSTRACTION – TIMING
INTERVAL

Representing each pipeline stage as a timing interval
[1,3]
End = Startstart
+ cache miss
latency finish
[3,7]
[4,10]
interval latency
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
R1 := R2 + 5
Structural
dependency
IF
ID
EX
WB
CM
R5 := R1 * R7
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
Contention
R3 := R5 * 5
A fixed-point analysis derives the timing of each stage as an interval
31
TDMA SHARED BUS ANALYSIS
Time Division Multiple Access (TDMA)
 Offset abstraction

Core 0
Core 0
offset delay
round
T
(core 1)
Core 1
Core 1
Core 0
Core 0
delay = 0
offset
round
T’
(core 0)
Core 1
Core 1
32
LOOP CONSTRUCT
previous
iteration
current
iteration
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
IF
ID
EX
WB
CM
How do we define bus context?
Property: If the bus offsets of the cross-iteration edges do not change,
WCET of the loop iteration cannot change
33
LOOP CONSTRUCT
Ci = bus context of the loop body at i-th iteration
C1
C2
Bus context flow graph
C3
C4
C5
C5  C3
Property: If Ci  Cj, then Ci+k  Cj+k for any k > 0
34
LOOP CONSTRUCT
C1
WCET
Bus context flow graph
of basic
blocks
C2
C3
loop bound
Program
Control
flow graph
C4
Micro-architectural
modeling
Infeasible
path
constraints
ILP
solve
r
Compute WCET for each bus contextLoop
bound
E(C1) = number of times context C1 is executed
constraints
Generate linear constraints:
E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound
ILP = Integer LinearE(C
Programming
Path analysis
1) ≥ E(C2)
35
BRANCH PREDICTION + CACHE
Cache
content
JOIN
Cache conflict
m
Branch location
m
Maximum number of
speculated instructions
m’
Unclear
cache
access
Cache
content
36
EXPERIMENTAL SETUP (CHRONOS
TOOLKIT)
GCC
simplescalar
C source
Micro
architectural
modeling
Private
cache
Shared cache
pipeline
Binary code
CFG
Flow
constraints
Branch
prediction
ILP
WCET
Shared bus
Micro-architectural
constraints
37
EVALUATION (CACHE + PIPELINE)
Imprecision of shared
cache analysis
Core 1
Core 1 Core 2
Core 2
Vertically partition
jfdctint
Horizontally partition
statemate
38
EVALUATION (CACHE + PIPELINE +
SPECULATION)
Imprecision of modeling
speculation
39
EVALUATION (BUS + PIPELINE)
Imprecision of path
analysis
Imprecision of shared
bus analysis
40
RECAP
High priority PE-0
Core
task
Cache1
Low priority PE-1
task
PE-N
Core Task
n
conflict
Shared cache
A multi-core
SPM-0 Shared cache
SPM-1
SPM-N
+
WCET tool
L1 data
shared bus L1 data
Unified cache
Coherence
Core
1
Core n
Fast on-chip
cache
cache
miss traffic
communication media
c
External
Stale data
Memory bus
Shared
Interface
items
L1 cache
L1 cache
Dissertation
work
Shared L2 cache
Shared
off-chip
data bus
(Time-predictable
execution in multi-core)
……
Shared
L2 cache
Off-chip
memory
Memory
Shared scratchpad
allocation
Coherence miss
modeling
Cache related
preemption delay
analysis
41
PERSPECTIVE
Time-predictable execution in single-core
Resource sharing
(cache and bus)
Data sharing
(cache coherence)
Time-predictable execution in multi-core
Testing
Shared
cache
ARM Cortex A9 MPCore
Samsung Exynos
Nvidia Tegra II
(smart phones)
Static analysis
Shared
bus
Time Division
Multiple Access
Customized
hardware
Cache
coherence
Shared
scratchpad
Sony PSP
IBM Cell
Aethreal Network-on-chip
42
PERSPECTIVE
Functionality Verification
Concrete domain
Quantitative Verification
Concrete domain
Abstract
domain in
abstract
Interpretation (AI)
Abstraction
Property
Verifier
May be
spurious
AI
Spurious
counter
example
Generate
Quantitative
property
Refinement
Verified
Abstraction
refinement
Path-sensitive
Verification
FUTURE WORK
Static performance
analysis
+ testing
Symbolic Execution
x<y
x<y
Performance
testing
Mobile devices
x<y
x=y
x≥y
x=y
m1
x == y
m2
Energy analysis of
software
Battery life
Energy-aware
software testing
x<y˄x≠y
Input
assert (C_m <= 1)
(Quantitative property
e.g. cache conflict)
abort
44
My sincere thanks to all the Examiners and
especially the anonymous
Examiner
1 for his
THANK Y
OU
comment on symbolic execution
45
Download