Multi-core Real-Time Scheduling for Generalized Parallel Task Models

advertisement
Multi-core Real-Time Scheduling for
Generalized Parallel Task Models
Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill
Real-Time Systems on Multi-core
 Traditional multiprocessor scheduling


Focuses on inter-task parallelism
Mostly restricted to sequential task models
 Computation-intensive complex real-time tasks are growing



Video surveillance
Radar tracking
Hybrid real-time structural testing
 Multi-core processors provide an opportunity to schedule
computation-intensive tasks in real-time


Most of the tasks exhibit intra-task parallelism
Real-time systems need to be developed to
exploit intra-task parallelism
2
Parallel Task Model
 Synchronous task model
Parallel threads
form a segment
Each horizontal bar indicates a thread
of execution (sequence of instructions)
Segment 1 Seg 2 Seg 3 Segment 4 Segment 5
Threads of each
segment synchronize
at the end of the
segment
Threads of Segment 1 synchronize here
 Lakshmanan et al. (RTSS ’10) have addressed a restricted
synchronous model where



A task is an alternate sequence of parallel and sequential segments
All parallel segments have an equal number of threads
The total number of threads in each segment ≤ number of cores
3
Our Contributions
 We address a general synchronous parallel task model


Different segments may have different numbers of threads
Each segment can have an arbitrary number of threads
 Example: such tasks are generated by


Parallel for loops in OpenMP, CilkPlus
Barrier primitives in thread libraries
 This model is more portable

The same program can execute on machines with different
numbers of cores
4
A Task Example
void parallel_task(float *a,float *b,float *c,float * d)
{
start
7
int n=7; int i=0;
parallel_for(; i< n; i++)
c[i] = a[i] + b[i];
n=4; i=0;
parallel_for(; i< n; i++)
d[i] = a[i] - b[i];
end
}
5
Our Contributions (contd..)
 We propose a task decomposition for general synchronous
parallel task model


Decomposes each parallel task into a set of sequential subtasks
Subtasks are scheduled like traditional tasks
 Why decomposition?


We can exploit the rich literature of multiprocessor scheduling
The proposed decomposition ensures that if the decomposed tasks
are schedulable, the original task set is also schedulable
6
Our Contributions (contd..)
 We analyze schedulability in terms of processor speed
augmentation bound

Speed augmentation bound ν for an Algorithm A: if an optimal
algorithm can schedule a synchronous parallel task set on unitspeed processor cores, then A can schedule the decomposed
tasks on ν-speed processor cores.
 We prove that the proposed decomposition requires a
speed augmentation of at most


4 for Global Earliest Deadline First (G-EDF) scheduling
5 for Partitioned Deadline Monotonic (P-DM) scheduling
7
Overview of a Task Decomposition
 Each thread of the task becomes an individual task with


An intermediate subdeadline
A release offset to retain precedence relations in the original task
 Deadlines are assigned by distributing slack among segments
 Deadline of a thread= execution requirement+ assigned slack
8
Slack Distribution
 How much slack a segment demands depends on


Available slack of the task
Execution requirement of the segment
 Execution requirement of a segment is the product of


Total number of parallel threads in the segment and
Execution requirement of each thread in the segment
 Larger execution requirement implies more demand for slack

In the figure, Segment 1 requires more slack than Segment 2
9
Slack Distribution (contd..)
 We use the following principle to distribute slack

All segments that receive slack will achieve an equal density
executionrequirem ent
deadline
(totalthreadsin S) * (exec. req. of a thread)
Density of a Segm entS 
Assigneddeadline
Density of a task 
 Reasons to equalize the density among segments




Fairness: deadline of each segment becomes proportional to its
execution requirement
We can bound the density of the decomposed tasks
We can exploit existing density-based analyses for multiprocessor
10
Slack Distribution (contd..)
 Slack of each segment is determined by solving the equalities


Sum of subdeadlines=task deadline (total assigned slack = task slack)
Density of Segment 1= density of Segment 2 = … so on
 All threads in a segment have the same deadline and offset


Deadline= execution requirement of the thread + segment slack
Release offset=sum of deadlines of preceding segment
11
An Example of Task Decomposition
Segment 1:
Segment 2: Segment 3: Segment 4:
Segment 5:
deadline=20 deadline=4 deadline=9 deadline=16 deadline=3
density=
(5*4)/20=1
density= density=
(2*2)/4=1 (3*3)/9=1
density=
(4*4)/16=1
density=
(1*3)/3=1
All segments have an equal density!
12
Global EDF (G-EDF) Schedulability

A sufficient condition for
G-EDF scheduling on m unitspeed cores [Baruah RTSS ’07]
d sum £ m - (m -1)d max
total
density
max density

A necessary condition
for any task set for any
scheduler
total utilization
usum £ m
Using the density bounds
for decomposed tasks
If the original task set is schedulable anyway on m unit-speed cores,
the decomposed tasks are schedulable under G-EDF on 4-speed cores
13
Partitioned DM (P-DM) Schedulability
FBB-FFD (Fisher Baruah Baker – First-Fit Decreasing) is a
well-known P-DM scheduler [ECRTS ’06]

A sufficient condition for FBB-FFD
scheduling on m unit-speed cores
load + usum - d max
m£
1 - d max
max cumulative exe.
req. of tasks divided
by time length

A necessary condition
for any scheduler
total utilization
usum £ m
Using load and density bounds
for decomposed tasks
If the original task set is schedulable anyway on m unit-speed cores,
the decomposed tasks are FBB-FFD schedulable on 5-speed cores
14
Conclusion
 Multi-core processors provide opportunities to schedule
computation-intensive tasks in real-time

Real-time systems need to exploit intra-task parallelism
 We have addressed real-time scheduling for generalized
synchronous parallel task model


Different segments may have different number of threads
Each segment can have an arbitrary number of threads
 We have proposed a task decomposition that achieves


A processor-speed augmentation bound of 4 for Global EDF
A processor-speed augmentation bound of 5 for Partitioned DM
15
Download