Predictability and Utilisation Trade-off in the Dynamic Management of Multiple Video Stream Decoding on Network-on-Chip based Homogeneous Embedded Multi-cores Hashan Roshantha Mendis (Rosh), Leandro Soares Indrusiak, Neil Audsley (Real-time Systems Group, University of York) RheonMedia 09/10/2014 RTNS 2014 - Versailles, France Outline • Motivation • System overview – Application model – Platform model – Task mapping • Admission control tests – Deterministic – Heuristic based • Evaluation method • Results • Conclusion and future work 09/10/2014 RTNS 2014 - Versailles, France 2 Motivation Context • NoC based soft real-time embedded systems • Heuristic based admission control of multiple video stream decoding tasks. Objectives • Make predictable admission control decisions of video stream decoding requests, while maintaining high system utilisation. 09/10/2014 RTNS 2014 - Versailles, France 3 Motivation Metrics definition • Improving predictability: – Minimising the maximum and mean lateness of the admitted video stream tasks – Reducing the number of dropped task (due to platform global input buffer overflow) • Improving utilisation: – Reduce idle time of processing elements 09/10/2014 RTNS 2014 - Versailles, France 4 System overview 09/10/2014 RTNS 2014 - Versailles, France 5 System overview - Application model 09/10/2014 RTNS 2014 - Versailles, France 6 System overview - Application model 09/10/2014 RTNS 2014 - Versailles, France 7 System overview - Application model Assumptions • Frame-rate = 25 fps • Fixed GoP structure - each job has the same predefined dependency pattern • End-to-end deadline of job (De2e) known • Aperiodic tasks • Fixed priority • All tasks of a job arrive at the same time. • Video stream start/end not known • Task execution times unknown. Worstcase/average-case can be estimated 09/10/2014 RTNS 2014 - Versailles, France 8 System overview - Platform model Processing Elements • Homogeneous • Separate individual kernel instances. • Same application code • Fixed priority-preemptive local task scheduler Network-on-Chip • 2D Mesh • XY routing • Fixed priority-preemptive arbitration 09/10/2014 RTNS 2014 - Versailles, France 9 System overview - Platform model Processing Elements • Homogeneous • Separate individual kernel instances. • Same application code • Fixed priority-preemptive local task scheduler Tasks are held until dependencies are met and PE task queue is not full. Network-on-Chip • 2D Mesh • XY routing • Fixed priority-preemptive arbitration 09/10/2014 RTNS 2014 - Versailles, France 10 System overview – Task mapping Semi-dynamic mapping • The same task mapping pattern is used for every subsequent job in a given video stream. • This job-level task to core mapping is established by the RM upon admission of a new video stream. • Shortest task queue first (STQF) 09/10/2014 RTNS 2014 - Versailles, France 11 Admission control tests • Deterministic – Using classical schedulability tests • Heuristic based – Based on subtask-lateness 09/10/2014 RTNS 2014 - Versailles, France 12 Admission control tests - Deterministic Schedulability analysis Worst-case response time of a task ππ [1]: Eq 1: ππ π+1 = ππ + ∀ππ∈βπ π πππ π π‘π π Worst case response time of a flow ππ ππ [2][3]: Eq 2: π π π+1 = πΆπ + ∀ππ ππ ∈πππ ππ βΆ response time of task ππ ππ βΆ worst-case execution time of task ππ π‘π βΆ period of task ππ βπ π βΆ higher priority tasks blocking ππ π π βΆ response time of flow Msgπ βΆ π½ππΌ interference jitter of flow Msgπ πΆπ βΆ basic latency of flow Msgπ πππ βΆ direct interferers of flow Msgπ π π π + ππ + π½ππΌ πΆπ ππ A job is considered schedulable if the end-to-end response time of its critical path is less than or equal to the end-to-end deadline (De2e) [1] N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. J. Wellings. Applying new scheduling theory to static priority pre-emptive scheduling. Software Eng. Journal, Sept. 1993 [2] Z. Shi, A. Burns, and L. S. Indrusiak. Schedulability analysis for real time on-chip communication with wormhole switching. Int. Journal of Embedded and Real-Time Comms. Systems, 2010. [3] L. S. Indrusiak. End-to-end schedulability tests for multiprocessor embedded systems based on networks-on-chip with priority-preemptive arbitration. Journal of Sys. Arch., May 2014. 09/10/2014 RTNS 2014 - Versailles, France 13 Admission control tests - Deterministic Schedulability analysis Worst-case response time of a task ππ [1]: Eq 1: ππ π+1 = ππ + ∀ππ∈βπ π πππ π π‘π π Worst case response time of a flow ππ ππ [2][3]: Eq 2: π π π+1 = πΆπ + ∀ππ ππ ∈πππ ππ βΆ response time of task ππ ππ βΆ worst-case execution time of task ππ π‘π βΆ period of task ππ βπ π βΆ higher priority tasks blocking ππ π π βΆ response time of flow Msgπ βΆ π½ππΌ interference jitter of flow Msgπ πΆπ βΆ basic latency of flow Msgπ πππ βΆ direct interferers of flow Msgπ π π π + ππ + π½ππΌ πΆπ ππ A job is considered schedulable if the end-to-end response time of its critical path is less than or equal to the end-to-end deadline (De2e) [1] N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. J. Wellings. Applying new scheduling theory to static priority pre-emptive scheduling. Software Eng. Journal, Sept. 1993 [2] Z. Shi, A. Burns, and L. S. Indrusiak. Schedulability analysis for real time on-chip communication with wormhole switching. Int. Journal of Embedded and Real-Time Comms. Systems, 2010. [3] L. S. Indrusiak. End-to-end schedulability tests for multiprocessor embedded systems based on networks-on-chip with priority-preemptive arbitration. Journal of Sys. Arch., May 2014. 09/10/2014 RTNS 2014 - Versailles, France 14 Admission control tests - Deterministic Schedulability analysis – Non-interferers • We assume there is no overlap between executions and invocations of different jobs within the same video stream. • Precedence constraints are taken into account when determining interfering tasks • Exclusion of possible non-interferers makes analysis tighter but still safe for deterministic AC decisions. 09/10/2014 RTNS 2014 - Versailles, France 15 Admission control tests – Heuristic based Instantaneous task lateness • We assume the individual task deadline is a ratio of the overall end-toend deadline of the job. • We make admission decisions based on the instantaneous lateness of the tasks (ππ ) in the Global input buffers and the PE task queues. Eq 3: ππ πΌπππ’π‘π΅π’ππππ = π‘π − ππ − π·π2π × π°π©π³ ∝ Eq 4: ππ πππ πππ’ππ’π = π‘π − ππ − π·π2π × π»πΈπ³ ∝ π‘π βΆ current time ππ βΆ task dispatch time πΌπ΅πΏ ∝∈ β 0 ≤ πΌπ΅πΏ ∝≤ 1}, πππΏ ∝∈ β 0 ≤ πππΏ ∝≤ 1} • If any of the tasks in the global input buffers or the PE task queues are late, then the new video stream admission request is rejected. 09/10/2014 RTNS 2014 - Versailles, France 16 Admission control tests – Heuristic based • Instantaneous task lateness • Previous work of determining the subtask deadline - by Kao and Garcia-Molina [4]; termed Deadline Equal flexibility scheme (D-EQF) • Total remaining slack is divided among the subtasks in proportion to their estimated execution times (WCET). [4] B. Kao and H. Garcia-Molina. Deadline assignment in a distributed soft real-time system. IEEE Trans. on Parallel and Distributed Systems, Dec. 1997 09/10/2014 RTNS 2014 - Versailles, France 17 Admission control tests – buffer overflow • Free space in buffers and task queues – If global input buffers or task queues do not have sufficient space, taskset will be dropped without receiving – Hence buffer overflow is also a cause for rejecting new video streams. 09/10/2014 RTNS 2014 - Versailles, France 18 Evaluation method • Admission control (AC) tests compared : – Baselines : No AC test, Deterministic AC, – Heuristic based (0.1 ≤ (IBLα , TQLα ) ≤ 1.0) • Measurements: – Number of video streams admitted/rejected/admitted but deadlines missed – Job lateness – Overall average PE busy time • High and Low workloads • Abstract system-level simulation with lightweight NoC simulation component [5]. • 35 simulation runs [5] L. S. Indrusiak and O. M. dos Santos. Fast and accurate transaction-level model of a wormhole network-on-chip with priority preemptive virtual channel arbitration. In DATE 2011, IEEE, 2011. 09/10/2014 RTNS 2014 - Versailles, France 19 Experiment results Video stream admission summary As the Heuristic ratios increase, the number of admitted video streams increase; however at the cost of, increased late streams and many jobs being dropped. Heu (D-EQF) offers best predictability guarantees : no late streams, higher admission rate than Deterministic AC-test. 09/10/2014 RTNS 2014 - Versailles, France 20 Experiment results Completed job (i.e. GoP) lateness As the Heuristic ratios increase, the distribution, mean and maximum job lateness increases. No-AC test shows very high job lateness levels. 09/10/2014 RTNS 2014 - Versailles, France 21 Experiment results Percentage system busy time (system utilisation) As the Heuristic ratios increase, the average system utilisation levels increase – more admitted video streams. Very low utilisation levels when deterministic AC test is used. 09/10/2014 RTNS 2014 - Versailles, France 22 Experiment results - summary Heuristic based AC tests (IBLα , TQLα) Higher value of IBLα combined with a mid/high value of TQLα provides best trade-off 09/10/2014 RTNS 2014 - Versailles, France 23 Conclusion and future work • Application specific task model • Heuristic based admission control approach. – Subtask lateness evaluated as a ratio of the end-to-end deadline of the overall taskset. • Improved utilisation over the deterministic AC test and better predictability guarantees than when No-AC is used. • The IBLα and TQLα thresholds can be chosen depending on the specific requirements . • Future work : explore better dynamic/semi-dynamic mapping approaches, compare other ACs 09/10/2014 RTNS 2014 - Versailles, France 24 Thank You ! Questions ? 09/10/2014 RTNS 2014 - Versailles, France 25 EXTRA SLIDES 09/10/2014 RTNS 2014 - Versailles, France 26 Evaluation method Simulation experiment parameters Number of workflows High (=16), Low(=8) Heuristic ratios (combinations) 0.1 ≤ (IBLα , TQLα ) ≤ 1.0 Simulation runs 35 Number of PEs 9 (3x3) PE task queue size 10 Dependency buff size 10 Global input buffer size 12 NoC frequency 1000MHz Videos per workflow (min/max) (6,7) GoPs per video stream (min/max) (7,8) 09/10/2014 RTNS 2014 - Versailles, France 27 Admission control tests - Deterministic Example task and flow timeline 09/10/2014 RTNS 2014 - Versailles, France 28 Admission control tests – Heuristic based • Instantaneous task lateness • Previous work of determining the subtask deadline (di) by Kao and Garcia-Molina [4]; termed Equal flexibility scheme (EQF) π Eq 5: ππ = ππ + ππ + π·π2π − ππ − ππ × π=1 ππ π π=1 ππ π: number of tasks in the taskset ππ: absolute deadline of subtask • Total remaining slack is divided among the subtasks in proportion to their estimated execution times (WCET). [4] B. Kao and H. Garcia-Molina. Deadline assignment in a distributed soft real-time system. IEEE Trans. on Parallel and Distributed Systems, Dec. 1997 09/10/2014 RTNS 2014 - Versailles, France 29