Predictability and Utilisation Trade-off in the Dynamic

advertisement
Predictability and Utilisation Trade-off
in the Dynamic Management of
Multiple Video Stream Decoding on
Network-on-Chip based Homogeneous
Embedded Multi-cores
Hashan Roshantha Mendis (Rosh),
Leandro Soares Indrusiak,
Neil Audsley
(Real-time Systems Group, University of York)
RheonMedia
09/10/2014
RTNS 2014 - Versailles, France
Outline
• Motivation
• System overview
– Application model
– Platform model
– Task mapping
• Admission control tests
– Deterministic
– Heuristic based
• Evaluation method
• Results
• Conclusion and future work
09/10/2014
RTNS 2014 - Versailles, France
2
Motivation
Context
• NoC based soft real-time embedded systems
• Heuristic based admission control of multiple
video stream decoding tasks.
Objectives
• Make predictable admission control decisions
of video stream decoding requests, while
maintaining high system utilisation.
09/10/2014
RTNS 2014 - Versailles, France
3
Motivation
Metrics definition
• Improving predictability:
– Minimising the maximum and mean lateness of
the admitted video stream tasks
– Reducing the number of dropped task (due to
platform global input buffer overflow)
• Improving utilisation:
– Reduce idle time of processing elements
09/10/2014
RTNS 2014 - Versailles, France
4
System overview
09/10/2014
RTNS 2014 - Versailles, France
5
System overview - Application model
09/10/2014
RTNS 2014 - Versailles, France
6
System overview - Application model
09/10/2014
RTNS 2014 - Versailles, France
7
System overview - Application model
Assumptions
• Frame-rate = 25 fps
• Fixed GoP structure - each job has the
same predefined dependency pattern
• End-to-end deadline of job (De2e)
known
• Aperiodic tasks
• Fixed priority
• All tasks of a job arrive at the same
time.
• Video stream start/end not known
• Task execution times unknown. Worstcase/average-case can be estimated
09/10/2014
RTNS 2014 - Versailles, France
8
System overview - Platform model
Processing Elements
• Homogeneous
• Separate individual kernel
instances.
• Same application code
• Fixed priority-preemptive
local task scheduler
Network-on-Chip
• 2D Mesh
• XY routing
• Fixed priority-preemptive
arbitration
09/10/2014
RTNS 2014 - Versailles, France
9
System overview - Platform model
Processing Elements
• Homogeneous
• Separate individual kernel
instances.
• Same application code
• Fixed priority-preemptive
local task scheduler
Tasks are held until
dependencies are
met and PE task
queue is not full.
Network-on-Chip
• 2D Mesh
• XY routing
• Fixed priority-preemptive
arbitration
09/10/2014
RTNS 2014 - Versailles, France
10
System overview – Task mapping
Semi-dynamic mapping
• The same task mapping
pattern is used for every
subsequent job in a given
video stream.
• This job-level task to core
mapping is established by
the RM upon admission of
a new video stream.
• Shortest task queue first
(STQF)
09/10/2014
RTNS 2014 - Versailles, France
11
Admission control tests
• Deterministic
– Using classical schedulability tests
• Heuristic based
– Based on subtask-lateness
09/10/2014
RTNS 2014 - Versailles, France
12
Admission control tests - Deterministic
Schedulability analysis
Worst-case response time of a task πœπ‘– [1]:
Eq 1:
π‘Ÿπ‘–
𝑛+1
= 𝑐𝑖 +
∀πœπ‘—∈β„Žπ‘ 𝑖
π‘Ÿπ‘–π‘›
𝑐
𝑑𝑗 𝑗
Worst case response time of a flow 𝑀𝑠𝑔𝑖 [2][3]:
Eq 2: 𝑅𝑖 𝑛+1 = 𝐢𝑖 +
∀𝑀𝑠𝑔𝑗 ∈𝑆𝑖𝑑
π‘Ÿπ‘– ∢ response time of task πœπ‘–
𝑐𝑖 ∢ worst-case execution time of task πœπ‘–
𝑑𝑖 ∢ period of task πœπ‘–
β„Žπ‘ 𝑖 ∢ higher priority tasks blocking πœπ‘–
𝑅𝑖 ∢ response time of flow Msg𝑖
∢
𝐽𝑖𝐼 interference jitter of flow Msg𝑖
𝐢𝑖 ∢ basic latency of flow Msg𝑖
𝑆𝑖𝑑 ∢ direct interferers of flow Msg𝑖
𝑅𝑖 𝑛 + π‘Ÿπ‘— + 𝐽𝑗𝐼
𝐢𝑗
𝑇𝑗
A job is considered schedulable if the end-to-end
response time of its critical path is less than or equal
to the end-to-end deadline (De2e)
[1] N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. J. Wellings. Applying new scheduling theory to static priority pre-emptive scheduling. Software Eng. Journal, Sept. 1993
[2] Z. Shi, A. Burns, and L. S. Indrusiak. Schedulability analysis for real time on-chip communication with wormhole switching. Int. Journal of Embedded and Real-Time Comms. Systems, 2010.
[3] L. S. Indrusiak. End-to-end schedulability tests for multiprocessor embedded systems based on networks-on-chip with priority-preemptive arbitration. Journal of Sys. Arch., May 2014.
09/10/2014
RTNS 2014 - Versailles, France
13
Admission control tests - Deterministic
Schedulability analysis
Worst-case response time of a task πœπ‘– [1]:
Eq 1:
π‘Ÿπ‘–
𝑛+1
= 𝑐𝑖 +
∀πœπ‘—∈β„Žπ‘ 𝑖
π‘Ÿπ‘–π‘›
𝑐
𝑑𝑗 𝑗
Worst case response time of a flow 𝑀𝑠𝑔𝑖 [2][3]:
Eq 2: 𝑅𝑖 𝑛+1 = 𝐢𝑖 +
∀𝑀𝑠𝑔𝑗 ∈𝑆𝑖𝑑
π‘Ÿπ‘– ∢ response time of task πœπ‘–
𝑐𝑖 ∢ worst-case execution time of task πœπ‘–
𝑑𝑖 ∢ period of task πœπ‘–
β„Žπ‘ 𝑖 ∢ higher priority tasks blocking πœπ‘–
𝑅𝑖 ∢ response time of flow Msg𝑖
∢
𝐽𝑖𝐼 interference jitter of flow Msg𝑖
𝐢𝑖 ∢ basic latency of flow Msg𝑖
𝑆𝑖𝑑 ∢ direct interferers of flow Msg𝑖
𝑅𝑖 𝑛 + π‘Ÿπ‘— + 𝐽𝑗𝐼
𝐢𝑗
𝑇𝑗
A job is considered schedulable if the end-to-end
response time of its critical path is less than or equal
to the end-to-end deadline (De2e)
[1] N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. J. Wellings. Applying new scheduling theory to static priority pre-emptive scheduling. Software Eng. Journal, Sept. 1993
[2] Z. Shi, A. Burns, and L. S. Indrusiak. Schedulability analysis for real time on-chip communication with wormhole switching. Int. Journal of Embedded and Real-Time Comms. Systems, 2010.
[3] L. S. Indrusiak. End-to-end schedulability tests for multiprocessor embedded systems based on networks-on-chip with priority-preemptive arbitration. Journal of Sys. Arch., May 2014.
09/10/2014
RTNS 2014 - Versailles, France
14
Admission control tests - Deterministic
Schedulability analysis – Non-interferers
• We assume there is no overlap between executions
and invocations of different jobs within the same
video stream.
• Precedence constraints are taken into account when
determining interfering tasks
• Exclusion of possible non-interferers makes analysis
tighter but still safe for deterministic AC decisions.
09/10/2014
RTNS 2014 - Versailles, France
15
Admission control tests – Heuristic based
Instantaneous task lateness
• We assume the individual task deadline is a ratio of the overall end-toend deadline of the job.
• We make admission decisions based on the instantaneous lateness of
the tasks (𝑙𝑖 ) in the Global input buffers and the PE task queues.
Eq 3:
𝑙𝑖 πΌπ‘›π‘π‘’π‘‘π΅π‘’π‘“π‘“π‘’π‘Ÿ = 𝑑𝑐 − π‘Žπ‘– − 𝐷𝑒2𝑒 × π‘°π‘©π‘³ ∝
Eq 4:
𝑙𝑖 π‘‡π‘Žπ‘ π‘˜π‘„π‘’π‘’π‘’π‘’ = 𝑑𝑐 − π‘Žπ‘– − 𝐷𝑒2𝑒 × π‘»π‘Έπ‘³ ∝
𝑑𝑐 ∢ current time
π‘Žπ‘– ∢ task dispatch time
𝐼𝐡𝐿 ∝∈ ℝ 0 ≤ 𝐼𝐡𝐿 ∝≤ 1},
𝑇𝑄𝐿 ∝∈ ℝ 0 ≤ 𝑇𝑄𝐿 ∝≤ 1}
• If any of the tasks in the global input buffers or the PE task queues are
late, then the new video stream admission request is rejected.
09/10/2014
RTNS 2014 - Versailles, France
16
Admission control tests – Heuristic based
• Instantaneous task lateness
• Previous work of determining the subtask
deadline - by Kao and Garcia-Molina [4]; termed
Deadline Equal flexibility scheme (D-EQF)
• Total remaining slack is divided among the
subtasks in proportion to their estimated
execution times (WCET).
[4] B. Kao and H. Garcia-Molina. Deadline assignment in a distributed soft real-time system. IEEE Trans. on Parallel and Distributed Systems, Dec. 1997
09/10/2014
RTNS 2014 - Versailles, France
17
Admission control tests – buffer overflow
• Free space in buffers and task queues
– If global input buffers or task queues do not have
sufficient space, taskset will be dropped without
receiving
– Hence buffer overflow is also a cause for rejecting
new video streams.
09/10/2014
RTNS 2014 - Versailles, France
18
Evaluation method
• Admission control (AC) tests compared :
– Baselines : No AC test, Deterministic AC,
– Heuristic based (0.1 ≤ (IBLα , TQLα ) ≤ 1.0)
• Measurements:
– Number of video streams admitted/rejected/admitted
but deadlines missed
– Job lateness
– Overall average PE busy time
• High and Low workloads
• Abstract system-level simulation with lightweight NoC
simulation component [5].
• 35 simulation runs
[5] L. S. Indrusiak and O. M. dos Santos. Fast and accurate transaction-level model of a wormhole network-on-chip with priority preemptive virtual channel arbitration. In DATE 2011, IEEE, 2011.
09/10/2014
RTNS 2014 - Versailles, France
19
Experiment results
Video stream admission summary
As the Heuristic ratios increase, the number of admitted video streams increase;
however at the cost of, increased late streams and many jobs being dropped.
Heu (D-EQF) offers best predictability guarantees : no late streams, higher
admission rate than Deterministic AC-test.
09/10/2014
RTNS 2014 - Versailles, France
20
Experiment results
Completed job (i.e. GoP) lateness
As the Heuristic ratios increase, the distribution, mean and maximum job
lateness increases.
No-AC test shows very high job lateness levels.
09/10/2014
RTNS 2014 - Versailles, France
21
Experiment results
Percentage system busy time (system utilisation)
As the Heuristic ratios increase, the average system utilisation levels increase –
more admitted video streams.
Very low utilisation levels when deterministic AC test is used.
09/10/2014
RTNS 2014 - Versailles, France
22
Experiment results - summary
Heuristic based
AC tests
(IBLα , TQLα)
Higher value of
IBLα combined
with a mid/high
value of TQLα
provides best
trade-off
09/10/2014
RTNS 2014 - Versailles, France
23
Conclusion and future work
• Application specific task model
• Heuristic based admission control approach.
– Subtask lateness evaluated as a ratio of the end-to-end
deadline of the overall taskset.
• Improved utilisation over the deterministic AC test and
better predictability guarantees than when No-AC is
used.
• The IBLα and TQLα thresholds can be chosen depending
on the specific requirements .
• Future work : explore better dynamic/semi-dynamic
mapping approaches, compare other ACs
09/10/2014
RTNS 2014 - Versailles, France
24
Thank You !
Questions ?
09/10/2014
RTNS 2014 - Versailles, France
25
EXTRA SLIDES
09/10/2014
RTNS 2014 - Versailles, France
26
Evaluation method
Simulation experiment parameters
Number of workflows
High (=16), Low(=8)
Heuristic ratios (combinations)
0.1 ≤ (IBLα , TQLα ) ≤ 1.0
Simulation runs
35
Number of PEs
9 (3x3)
PE task queue size
10
Dependency buff size
10
Global input buffer size
12
NoC frequency
1000MHz
Videos per workflow (min/max)
(6,7)
GoPs per video stream (min/max)
(7,8)
09/10/2014
RTNS 2014 - Versailles, France
27
Admission control tests - Deterministic
Example task and flow timeline
09/10/2014
RTNS 2014 - Versailles, France
28
Admission control tests – Heuristic based
• Instantaneous task lateness
• Previous work of determining the subtask deadline (di) by Kao and
Garcia-Molina [4]; termed Equal flexibility scheme (EQF)
π‘š
Eq 5:
𝑑𝑖 = π‘Žπ‘– + 𝑐𝑖 +
𝐷𝑒2𝑒 − π‘Žπ‘– −
𝑐𝑗 ×
𝑗=1
𝑐𝑖
π‘š
𝑗=1 𝑐𝑖
π‘š: number of tasks in the taskset
𝑑𝑖: absolute deadline of subtask
• Total remaining slack is divided among the subtasks in proportion to
their estimated execution times (WCET).
[4] B. Kao and H. Garcia-Molina. Deadline assignment in a distributed soft real-time system. IEEE Trans. on Parallel and Distributed Systems, Dec. 1997
09/10/2014
RTNS 2014 - Versailles, France
29
Download