A Fully Preemptive Multiprocessor Semaphore Protocol for Latency-Sensitive Real-Time Applications ECRTS’13 July 12, 2013 Björn B. Brandenburg bbb@mpi-sws.org A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications A Rhetorical Question On uniprocessors, why do we use the priority inheritance protocol (PIP) or the priority ceiling protocol (PCP) instead of simple non-preemptive sections? AUTOSAR Non-Preemptive Critical Section: SuspendAllInterrupts(…); // critical section ResumeAllInterrupts(…); MPI-SWS Brandenburg 2 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications RT 101: Preemptive Synchronization Matters uniprocessor, non-preemptive critical sections release deadline unrelated, latency-sensitive high-priority task less time-critical lower-priority tasks long lower-priority critical section (CS) time MPI-SWS Brandenburg 3 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications RT 101: Preemptive Synchronization Matters Deadline miss due to latency increase! uniprocessor, non-preemptive critical sections release deadline unrelated, latency-sensitive high-priority task less time-critical lower-priority tasks long lower-priority critical section (CS) time Long non-preemptive critical section. MPI-SWS Brandenburg 4 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications RT 101: Preemptive Synchronization Matters uniprocessor, with PIP release deadline unrelated, latency-sensitive high-priority task long lowerpriority CS less time-critical lower-priority tasks time MPI-SWS Brandenburg 5 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications RT 101: Preemptive Synchronization Matters Latency-sensitive task isolated from unrelated critical section! uniprocessor, with PIP release deadline unrelated, latency-sensitive high-priority task long lowerpriority CS less time-critical lower-priority tasks time Lower-priority critical section: fully preemptive execution. MPI-SWS Brandenburg 6 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications The Multiprocessor Case What if we host the same workload on a multiprocessor? partitioned multiprocessor scheduling unrelated, latency-sensitive high-priority task less time-critical lower-priority tasks (on same core) time MPI-SWS Brandenburg 7 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications No existing real-time semaphore protocol The Multiprocessor Case for partitioned or clustered scheduling isolates high-priority tasks from unrelated CSs. partitioned multiprocessor scheduling release deadline unrelated, latency-sensitive high-priority task less time-critical lower-priority tasks (on same core) long lower-priority critical section (CS) time MPCP, FMLP, MPI-SWS + FMLP , Brandenburg OMLP, … 8 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications This Paper Independence preservation formalizes the idea that “tasks should never be delayed by unrelated critical sections.” MPI-SWS Brandenburg 9 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications This Paper Independence preservation formalizes the idea that “tasks should never be delayed by unrelated critical sections.” Independence preservation is impossible without (limited) job migrations. MPI-SWS Brandenburg 10 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications This Paper Independence preservation formalizes the idea that “tasks should never be delayed by unrelated critical sections.” Independence preservation is impossible without (limited) job migrations. First independence-preserving semaphore protocol for clustered/partitioned scheduling; the protocol also has asymptotically optimal blocking bounds. MPI-SWS Brandenburg 11 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Clustered JLFP Scheduling Job-Level Fixed-Priority Scheduling (JLFP) c … number of processors per cluster m … number of processors (total) J4 J3 J2 J1 J4 J3 Q4 Core 4 L2 Cache L2 Cache J1 J3 J1 Core 1 Core 2 Core 3 L2 Cache Q1 Q3 Core 3 J2 Q2 Q2 Core 2 J4 Q1 Q1 Core 1 J2 Core 4 L2 Cache Core 1 Core 2 Core 3 L2 Cache Core 4 L2 Cache Main Memory Main Memory partitioned scheduling clustered scheduling global scheduling c=1 1≤c≤m c=m MPI-SWS Brandenburg Main Memory 12 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications This talk: Partitioned Fixed-Priority (P-FP) Scheduling Clustered JLFP Scheduling Job-Level Fixed-Priority Scheduling (JLFP) c … number of processors per cluster m … number of processors (total) J4 J3 J2 J1 J4 J3 Q4 Core 4 L2 Cache L2 Cache J1 J3 J1 Core 1 Core 2 Core 3 L2 Cache Q1 Q3 Core 3 J2 Q2 Q2 Core 2 J4 Q1 Q1 Core 1 J2 Core 4 L2 Cache Core 1 Core 2 Core 3 L2 Cache Core 4 L2 Cache Main Memory Main Memory partitioned scheduling clustered scheduling global scheduling c=1 1≤c≤m c=m MPI-SWS Brandenburg Main Memory 13 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Clustered JLFP Scheduling Job-Level Fixed-Priority Scheduling (JLFP) c … number of processors per cluster m … number of processors (total) J4 J3 J2 J1 J4 J3 Q4 Core 4 L2 Cache L2 Cache J1 J3 J1 Core 1 Core 2 Core 3 L2 Cache Q1 Q3 Core 3 J2 Q2 Q2 Core 2 J4 Q1 Q1 Core 1 J2 Core 4 L2 Cache Main Memory Main Memory partitioned scheduling clustered scheduling Core 1 Core 2 Core 3 L2 Cache Core 4 L2 Cache Main Memory global scheduling c=1 1≤c≤m c=m Task model: implicit-deadline sporadic tasks (choice of deadline constraint irrelevant to results) MPI-SWS Brandenburg 14 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols Binary Semaphores in POSIX pthread_mutex_lock(…) // critical section pthread_mutex_unlock(…) A blocked task suspends & yields the processor. MPI-SWS Brandenburg 15 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols Binary Semaphores in POSIX Priority Inversion A job should be scheduled, but is not. pthread_mutex_lock(…) // critical section pthread_mutex_unlock(…) A blocked task suspends & yields the processor. MPI-SWS PI-Blocking: increase in worst-case response time due to priority inversions. Brandenburg 16 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols Binary Semaphores in POSIX Priority Inversion A job should be scheduled, but is not. pthread_mutex_lock(…) // critical section pthread_mutex_unlock(…) A blocked task suspends & yields the processor. PI-Blocking: increase in worst-case response time due to priority inversions. Goal: bounded pi-blocking. Bounded in terms of critical section lengths only! MPI-SWS Brandenburg 17 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols Binary Semaphores in POSIX Priority Inversion A job should be scheduled, but is not. pthread_mutex_lock(…) // critical section pthread_mutex_unlock(…) A blocked task suspends & yields the processor. PI-Blocking: increase in worst-case response time due to priority inversions. Assumptions ➡ Unnested critical sections. ➡ Suspension-oblivious schedulability analysis. MPI-SWS Brandenburg 18 Part 1 Avoiding Delays due to Unrelated Critical Sections A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Independence Preservation (specific to s-oblivious analysis) “Tasks should never be delayed by unrelated critical sections.” MPI-SWS Brandenburg 20 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Independence Preservation (specific to s-oblivious analysis) Let bi,q denote the maximum pi-blocking incurred by task Ti due to requests for resource q. Let Ni,q denote the maximum number of times that any job of Ti accesses resource q. Under an independence-preserving locking protocol, if Ni,q = 0, then bi,q = 0. “You only pay for what you use.” MPI-SWS Brandenburg 21 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Independence Preservation (specific to s-oblivious analysis) Let bi,q denote the maximum pi-blocking incurred by task Ti due to requests for resource q. Let Ni,q denote the maximum number of times that any job of Ti accesses resource q. Under an independence-preserving locking protocol, if Ni,q = 0, then bi,q = 0. Isolation useful for: latency-sensitive (if no delay be tolerated) or “You workloads only pay for what youcan use.” if low-priority tasks contain unknown or untrusted critical sections. MPI-SWS Brandenburg 22 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols real-time locking protocol MPI-SWS = progress mechanism Brandenburg + queue structure 23 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols real-time locking protocol = progress mechanism + queue structure Ensure that a lock holder is scheduled (while waiting tasks incur pi-blocking). How to order conflicting critical sections (e.g., priority queue, FIFO queues). MPI-SWS Brandenburg 24 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols real-time locking protocol = partitioned scheduling priority boosting MPI-SWS progress mechanism + clustered scheduling priority donation Brandenburg queue structure global scheduling priority inheritance 25 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Priority boosting and Priority Donation: lock-holding jobs have higher priority than non-lock-holding jobs Real-Time Semaphore Protocols ➔ effectively non-preemptive ➔ not independence preserving real-time locking protocol = partitioned scheduling priority boosting MPI-SWS progress mechanism + clustered scheduling priority donation Brandenburg queue structure global scheduling priority inheritance 26 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Real-Time Semaphore Protocols real-time locking protocol = partitioned scheduling priority boosting progress mechanism queue structure + clustered scheduling priority donation global scheduling priority inheritance Existing independence-preserving locking protocols: Global PIP, Global FMLP, Global OMLP, … MPI-SWS Brandenburg 27 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Observation Independence preservation + bounded priority inversion requires intra-cluster job migrations. J2 J1 Q3 Q4 Core 3 Core 4 J1 J3 Core 1 Q2 Q2 Core 2 J4 Q1 Q1 Core 1 L2 Cache MPI-SWS J4 J3 J2 Core 2 Core 3 L2 Cache L2 Cache Core 4 L2 Cache Main Memory Main Memory partitioned scheduling clustered scheduling Brandenburg 28 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Intra-cluster: (temporarily) execute jobs on processors/clusters they have not been assigned to. Observation Independence preservation + bounded priority inversion requires intra-cluster job migrations. J2 J1 Q3 Q4 Core 3 Core 4 J1 J3 Core 1 Q2 Q2 Core 2 J4 Q1 Q1 Core 1 L2 Cache MPI-SWS J4 J3 J2 Core 2 Core 3 L2 Cache L2 Cache Core 4 L2 Cache Main Memory Main Memory partitioned scheduling clustered scheduling Brandenburg 29 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 T1 time MPI-SWS Brandenburg 30 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 T1 time T2 starts executing critical section… MPI-SWS Brandenburg 31 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling Job of T1 is released. T3 What to do with in-progress critical section? T2 T1 time MPI-SWS Brandenburg 32 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 1: priority boosting (=let T2 continue). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 critical section T1 time MPI-SWS Brandenburg 33 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 1: priority boosting (=let T2 continue). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 CS pi-blocked critical section T1 time MPI-SWS Brandenburg 34 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 1: priority boosting (=let T2 continue). Core 2 three tasks, two cores, one resource, P-FP scheduling T3 CS pi-blocked Core 1 Benefit: T3 incurs only bounded pi-blocking, meets deadline. T2 critical section T1 time MPI-SWS Brandenburg 35 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 1: priority boosting (=let T2 continue). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 CS pi-blocked critical section T1 time Problem: T1 misses its deadline. MPI-SWS Brandenburg 36 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling Job of T1 is released. T3 What to do with in-progress critical section? T2 T1 time MPI-SWS Brandenburg 37 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 2: independence preservation (= preempt T2). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 T1 time MPI-SWS Brandenburg 38 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 2: independence preservation (= preempt T2). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 T1 time Independence preservation: T1 meets its deadline. MPI-SWS Brandenburg 39 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 2: independence preservation (= preempt T2). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 pi-blocked CS T2 T1 time MPI-SWS Brandenburg 40 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Case 2: independence preservation (= preempt T2). Core 2 three tasks, two cores, one resource, P-FP scheduling T3 pi-blocked CS Core 1 Problem: T3 incurs “unbounded” pi-blocking, misses deadline! T2 T1 time MPI-SWS Brandenburg 41 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Partitioned Scheduling with Migrations? MPI-SWS Brandenburg 42 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Partitioned Scheduling with Migrations? Partitioned By Necessity J1 J2 J3 J4 Q1 Q2 Q3 Q4 Core 1 Local Memory Core 2 Local Memory Core 3 Local Memory Core 4 E.g., SoC with heterogeneous cores (ARM, PowerPC, x86, MIPS). Local Memory Shared Memory migrations infeasible for lack of technical capability MPI-SWS Brandenburg 43 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Partitioned Scheduling with Migrations? Partitioned By Necessity J1 J2 J3 J4 Q1 Q2 Q3 Q4 Core 1 Local Memory Core 2 Local Memory Core 3 Local Memory Core 4 E.g., SoC with heterogeneous cores (ARM, PowerPC, x86, MIPS). Local Memory Shared Memory migrations infeasible for lack of technical capability ➔ independence preservation and bounded priority inversion impossible to achieve! MPI-SWS Brandenburg 44 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Partitioned Scheduling with Migrations? Partitioned By Choice Partitioned By Necessity J1 J4 J3 Q4 Q1 Q2 Q3 Q4 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Local Memory Local Memory Local Memory L2 Cache L2 Cache Main Memory Shared Memory migrations disallowed but technically feasible migrations infeasible for lack of technical capability MPI-SWS J2 J1 J4 Q3 Local Memory J3 Q2 Q1 Core 1 J2 Brandenburg 45 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Occasional migrations not desirable, but possible! Partitioned Scheduling with Migrations? (Focus of this work.) Partitioned By Choice Partitioned By Necessity J1 J4 J3 Q4 Q1 Q2 Q3 Q4 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Local Memory Local Memory Local Memory L2 Cache L2 Cache Main Memory Shared Memory migrations disallowed but technically feasible migrations infeasible for lack of technical capability MPI-SWS J2 J1 J4 Q3 Local Memory J3 Q2 Q1 Core 1 J2 Brandenburg 46 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling Job of T1 is released. T3 What to do with in-progress critical section? T2 T1 time MPI-SWS Brandenburg 47 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary 1) Ensure independence preservation (= preempt T2). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 T2 T1 time Independence preservation: T1 meets its deadline. MPI-SWS Brandenburg 48 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary 2) Ensure bounded pi-blocking (= schedule T2). Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 pi-blocked CS T2 T1 time Easy fix: migrate T2 when T3 suspends. MPI-SWS Brandenburg 49 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Temporarily move T3 to Core 2… Core 1 Core 2 three tasks, two cores, one resource, P-FP scheduling T3 pi-blocked CS T2 T1 time Easy fix: migrate T2 when T3 suspends. MPI-SWS Brandenburg 50 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Example: Job Migration is Necessary Core 1 Core 2 three tasks, one resource, Benefit: T3 two incurscores, only bounded pi-blocking,P-FP meetsscheduling deadline. T3 pi-blocked CS T2 T1 time Easy fix: migrate T2 when T3 suspends. MPI-SWS Brandenburg 51 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Theorem Under non-global scheduling (c ≠ m), it is impossible for a semaphore protocol to simultaneously (i) prevent unbounded pi-blocking, (ii) be independence-preserving, and (iii) avoid inter-cluster job migrations. Pick any two… MPI-SWS Brandenburg 52 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Combinations of Properties Under non-global scheduling (c ≠ m), it is impossible for a semaphore protocol to simultaneously (i) prevent unbounded pi-blocking, (ii) be independence-preserving, and (iii) avoid inter-cluster job migrations. (i) & (iii) ➡ MPCP, Part. FMLP, FMLP+, OMLP, … (ii) & (iii) ➡ Applying PIP to partitioned scheduling (not sound!) (i) & (ii) ➡ no such protocol known! MPI-SWS Brandenburg 53 Part 2 Independence Preservation + Asymptotically Optimal PI-Blocking A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications High-Level Overview real-time locking protocol MPI-SWS = progress mechanism Brandenburg + queue structure 55 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications High-Level Overview real-time locking protocol = progress mechanism Must be independence-preserving. MPI-SWS Brandenburg + queue structure Must ensure asymptotic optimality. 56 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications High-Level Overview real-time locking protocol = progress mechanism + Must be independence-preserving. queue structure Must ensure asymptotic optimality. Adopt intuition from example: when lock holder is preempted, migrate to blocked task’s processor. MPI-SWS Brandenburg 57 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Migratory Priority Inheritance classic priority inheritance inherit priority of blocked jobs MPI-SWS Brandenburg 58 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Migratory Priority Inheritance classic priority inheritance inherit priority of blocked jobs + “cluster inheritance” inherit eligibility to execute on assigned clusters from blocked jobs MPI-SWS Brandenburg 59 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Jobs remain fully preemptive even in critical sections. Migratory Priority Inheritance ➔ enables independence preservation classic priority inheritance inherit priority of blocked jobs + “cluster inheritance” inherit eligibility to execute on assigned clusters from blocked jobs MPI-SWS Brandenburg 60 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications High-Level Overview real-time locking protocol = progress mechanism Must be independence-preserving. MPI-SWS Brandenburg + queue structure Must ensure asymptotic optimality. 61 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications High-Level Overview real-time locking protocol = progress mechanism Must be independence-preserving. + queue structure Must ensure asymptotic optimality. Resolve (most) contention within clusters: use a multi-level queue. MPI-SWS Brandenburg 62 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications A 3-Level FIFO/FIFO/PRIO Queue one 3-level queue for each resource Cluster 1 … shared resource Cluster 2 Cluster K MPI-SWS Brandenburg 63 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications A 3-Level FIFO/FIFO/PRIO Queue one 3-level queue for each resource Cluster 1 Cluster 2 priority PQ2 queue … shared resource priority PQ1 queue Cluster K MPI-SWS Brandenburg priority PQK queue 64 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications A 3-Level FIFO/FIFO/PRIO Queue one 3-level queue for each resource Cluster 1 priority PQ1 queue FIFO Queue FQ1 priority PQ2 queue FIFO Queue FQ2 … shared resource Cluster 2 Cluster K FIFO Queue FQK MPI-SWS Brandenburg priority PQK queue 65 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications A 3-Level FIFO/FIFO/PRIO Queue Bounded length: at most c jobs (in each cluster). one 3-level queue for each resource (c = number of cores in cluster) Cluster 1 priority PQ1 queue FIFO Queue FQ1 priority PQ2 queue FIFO Queue FQ2 … shared resource Cluster 2 Cluster K FIFO Queue FQK MPI-SWS Brandenburg priority PQK queue 66 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Priority queue used only if more than c jobs contend. A 3-Level FIFO/FIFO/PRIO Queue (c = number of cores in cluster) one 3-level queue for each resource Cluster 1 priority PQ1 queue FIFO Queue FQ1 priority PQ2 queue FIFO Queue FQ2 … shared resource Cluster 2 Cluster K FIFO Queue FQK MPI-SWS Brandenburg priority PQK queue 67 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications A 3-Level FIFO/FIFO/PRIO Queue one 3-level queue for each resource Cluster 1 priority PQ1 queue FIFO Queue FQ1 FIFO Queue GQ Cluster 2 priority PQ2 queue … FIFO Queue FQ2 Cluster K FIFO Queue FQK MPI-SWS Brandenburg priority PQK queue 68 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Global FIFO Queue resolves inter-cluster contention. A 3-Level FIFO/FIFO/PRIO Queue Bounded length: at most K = m / c jobs (one per cluster). one 3-level queue for each resource (m = total number of cores, c = number of cores per cluster) Cluster 1 priority PQ1 queue FIFO Queue FQ1 FIFO Queue GQ Cluster 2 priority PQ2 queue … FIFO Queue FQ2 Cluster K FIFO Queue FQK MPI-SWS Brandenburg priority PQK queue 69 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications The O(m) Independence-Preserving Locking Protocol (OMIP) The OMIP = migratory priority inheritance independence-preserving MPI-SWS Brandenburg + 3-level F/F/P queue O(m) s-oblivious pi-blocking 70 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications The O(m) Independence-Preserving Locking Protocol (OMIP) The OMIP = migratory priority inheritance independence-preserving + 3-level F/F/P queue O(m) s-oblivious pi-blocking Ω(m) lower bound on s-oblivious pi-blocking (— & Anderson, 2010) ➔ The OMIP ensures asymptotically optimal s-oblivious pi-blocking. MPI-SWS Brandenburg 71 Part 3 Evaluation A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Prototype Implementation Linux Testbed for Multiprocessor Scheduling in Real-Time Systems www.litmus-rt.org 3-level queues ➡ easy (reuse Linux wait queues) ➡ cheap compared to syscall Migratory priority inheritance ➡ more tricky (need to avoid global locks) ➡ store bitmap of cores “offering” to schedule lock holder in each lock MPI-SWS Brandenburg 73 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Times Experiment on an 8-core, 2-Ghz Xeon X7550 System Core 1 Core 8 T1 T8 T9 … T16 T17 T24 T25 T32 Setup ➡ 4 tasks on each core (one independent & latency-sensitive) ➡ one shared resource ➡ max. critical section length: ~1ms MPI-SWS Brandenburg 74 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications One latency-sensitive, independent task with period = 1ms. on an 8-core, 2-Ghz Xeon X7550 System Response Times Experiment Core 1 Core 8 T1 T8 T9 … T16 T17 T24 T25 T32 Setup ➡ 4 tasks on each core (one independent & latency-sensitive) ➡ one shared resource ➡ max. critical section length: ~1ms MPI-SWS Brandenburg 75 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Times Experiment Three tasks (on each core) with periods 25 ms, 100 ms, and 1000 ms. on an 8-core, Xeontasks X7550 System Each 2-Ghz job of these locks the resource once. Core 1 Core 8 T1 T8 T9 … T16 T17 T24 T25 T32 Setup ➡ 4 tasks on each core (one independent & latency-sensitive) ➡ one shared resource ➡ max. critical section length: ~1ms MPI-SWS Brandenburg 76 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Times Experiment on an 8-core, 2-Ghz Xeon X7550 System Core 1 Core 8 T1 T8 T9 … T16 T17 T24 T25 T32 MPI-SWS Three Configurations ➡ No locks (unsound!) ‣ no blocking (baseline) ➡ Clustered OMLP ‣ priority donation ➡ OMIP ‣ migratory priority inheritance Experiment ➡ Measured response times with sched_trace ➡ 30-minute traces ➡ more than 45 million jobs Brandenburg 77 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Time CDF Fraction of jobs with response time at most X 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ 70.00%$ 60.00%$ 50.00%$ 40.00%$ 30.00%$ 20.00%$ 10.00%$ NONE$CDF$ higher is better OMIP$CDF$ OMLP$CDF$ increasing response time 0.00%$ 0$ MPI-SWS 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ response)*me)(in)ms))) Brandenburg 0.8$ 0.9$ 1$ 78 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Time CDF of 1-ms Tasks 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ 70.00%$ 60.00%$ OMLP 50.00%$ NONE$CDF$ OMIP & NONE (overlapping) 40.00%$ 30.00%$ OMIP$CDF$ OMLP$CDF$ 20.00%$ 10.00%$ 0.00%$ 0$ MPI-SWS 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ response)*me)(in)ms))) Brandenburg 0.8$ 0.9$ 1$ 79 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Time CDF of 1-ms Tasks 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ 70.00%$ 60.00%$ NONE$CDF$ 50.00%$ With priority donation (or priority boosting), ~20% of the jobs of the 1ms-tasks miss their deadline. 40.00%$ 30.00%$ OMIP$CDF$ OMLP$CDF$ 20.00%$ 10.00%$ 0.00%$ 0$ MPI-SWS 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ response)*me)(in)ms))) Brandenburg 0.8$ 0.9$ 1$ 80 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Time CDF of 1-ms Tasks 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ 70.00%$ 60.00%$ NONE$CDF$ 50.00%$ Response time distribution under OMIP equivalent to case without locks. 40.00%$ 30.00%$ 20.00%$ OMIP$CDF$ OMLP$CDF$ (OMIP & NONE curves overlap) 10.00%$ 0.00%$ 0$ MPI-SWS 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ response)*me)(in)ms))) Brandenburg 0.8$ 0.9$ 1$ 81 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Time CDF of 100-ms Tasks 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ 70.00%$ 60.00%$ NONE$CDF$ 50.00%$ OMIP$CDF$ 40.00%$ OMLP$CDF$ 30.00%$ 20.00%$ 10.00%$ 0.00%$ 0$ MPI-SWS 10$ 20$ 30$ 40$ 50$ 60$ 70$ response)*me)(in)ms)) Brandenburg 80$ 90$ 100$ 82 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications OMLP Response Time CDF of 100-ms Tasks NONE 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ OMIP 70.00%$ 60.00%$ NONE$CDF$ 50.00%$ OMIP$CDF$ 40.00%$ OMLP$CDF$ 30.00%$ 20.00%$ 10.00%$ 0.00%$ 0$ MPI-SWS 10$ 20$ 30$ 40$ 50$ 60$ 70$ response)*me)(in)ms)) Brandenburg 80$ 90$ 100$ 83 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Response Time CDF of 100-ms Tasks 100.00%$ 90.00%$ P(response)*me))≤)X)) 80.00%$ 70.00%$ 60.00%$ NONE$CDF$ 50.00%$ OMIP$CDF$ 40.00%$ OMIP: blocking shifted to lowerpriority (= later-deadline) jobs. 30.00%$ 20.00%$ OMLP$CDF$ 10.00%$ 0.00%$ 0$ MPI-SWS 10$ 20$ 30$ 40$ 50$ 60$ 70$ response)*me)(in)ms)) Brandenburg 80$ 90$ 100$ 84 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Analytical Blocking/Latency Tradeoff Large-scale schedulability experiments ➡ Varied #tasks, #cores, #resources, max. critical section lengths, etc. ➡ >150,000,000 task sets ➡ 678 schedulability plots, available in online appendix MPI-SWS Brandenburg 85 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Analytical Blocking/Latency Tradeoff Large-scale schedulability experiments ➡ Varied #tasks, #cores, #resources, max. critical section lengths, etc. ➡ >150,000,000 task sets ➡ 678 schedulability plots, available in online appendix In the presence of latency-sensitive tasks, the OMIP is generally the only viable option. MPI-SWS Brandenburg 86 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Analytical Blocking/Latency Tradeoff Large-scale schedulability experiments ➡ Varied #tasks, #cores, #resources, max. critical section lengths, etc. ➡ >150,000,000 task sets ➡ 678 schedulability plots, available in online appendix In the presence of latency-sensitive tasks, the OMIP is generally the only viable option. Without latency-sensitive tasks, the OMIP does not offer substantial improvements. MPI-SWS Brandenburg 87 Conclusion A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Summary Independence preservation formalizes the idea that “tasks should not be delayed by unrelated critical sections.” Independence preservation is impossible without (limited) job migrations. The OMIP is the first independence-preserving semaphore protocol for clustered scheduling. It ensures asymptotically optimal s-oblivious pi-blocking. MPI-SWS Brandenburg 89 Future Work A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Budget Overruns Nesting Suspension-Aware Analysis MPI-SWS Brandenburg 90 Thanks! A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications SchedCAT Schedulability test Collection And Toolkit Linux Testbed for Multiprocessor Scheduling in Real-Time Systems www.mpi-sws.org/~bbb/ projects/schedcat www.litmus-rt.org MPI-SWS Brandenburg Appendix A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Design Inspirations Migrate to Blocked Task’s CPU ➡ “Local helping” in TU Dresden’s Fiasco/L4 ‣ Hohmuth & Peter (2001) ➡ Multiprocessor bandwidth inheritance (MBWI) ‣ Faggioli, Lipari, & Cucinotta (2010) Queue Design ➡ Intra-cluster queues adopted from global OMLP ‣ — & Anderson (2010) ➡ Inter-cluster queues similar to clustered OMLP ‣ — & Anderson (2011) MPI-SWS Brandenburg 93 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications 100.00%$ 100.00%$ 90.00%$ 90.00%$ 80.00%$ 80.00%$ 70.00%$ 60.00%$ NONE$CDF$ 50.00%$ OMIP$CDF$ 40.00%$ OMLP$CDF$ 30.00%$ P(response)*me))≤)X)) P(response)*me))≤)X)) What about overheads? 70.00%$ 60.00%$ 10.00%$ 10.00%$ 0.00%$ 0.00%$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ response)*me)(in)ms))) 0.8$ 0.9$ OMLP$CDF$ 30.00%$ 20.00%$ 0.1$ OMIP$CDF$ 40.00%$ 20.00%$ 0$ NONE$CDF$ 50.00%$ 1$ 0$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ response)*me)(in)ms)) 80$ 90$ 100$ Aren’t job migrations expensive? ➡ response time experiments reflect all overheads in real system ➡ latency-sensitive tasks do not migrate, only lower-priority tasks do ➡ only working set of critical section migrates (likely small), not entire task working set (likely much larger) ➡ the critical section would have been preempted anyway MPI-SWS Brandenburg 94 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Blocking Analysis FIFO Queue GQ m … number of processors (total) MPI-SWS FIFO Queue FQ PQ priority queue c … number of processors per cluster Brandenburg 95 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Blocking Analysis FIFO Queue GQ FIFO Queue FQ at most K-1=m/c-1 queued jobs at most c-1 queued jobs m … number of processors (total) MPI-SWS PQ priority queue c … number of processors per cluster Brandenburg 96 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications Blocking Analysis FIFO Queue GQ FIFO Queue FQ at most K-1=m/c-1 queued jobs at most c-1 queued jobs at most m/c-1 blocking CS at most (c - 1) · (m / c) blocking CS m … number of processors (total) MPI-SWS PQ priority queue c … number of processors per cluster Brandenburg 97 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications At most m / c - 1 + (c - 1) · (m / c) = m - 1 = O(m) blocking critical sections. Blocking Analysis FIFO Queue GQ FIFO Queue FQ at most K-1=m/c-1 queued jobs at most c-1 queued jobs at most m/c-1 blocking CS at most (c - 1) · (m / c) blocking CS m … number of processors (total) MPI-SWS PQ priority queue c … number of processors per cluster Brandenburg 98 A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications At most m / c - 1 + (c - 1) · (m / c) = m - 1 = O(m) blocking critical sections. Blocking Analysis FIFO Queue GQ at most K-1=m/c-1 queued jobs FIFO Queue FQ PQ priority queue at most c-1 queued jobs Under s-oblivious analysis: at most O(m) critical sections cause pi-blocking. at most m/c-1 blocking CS m … number of processors (total) MPI-SWS at most (c - 1) · (m / c) blocking CS c … number of processors per cluster Brandenburg 99