EMERALDS Landon Cox March 30, 2016

advertisement
EMERALDS
Landon Cox
March 30, 2016
Real-time systems
• Typical hardware constraints
• Slow, low-power processors
• Small memories
• Little to no persistent storage
• Typical workload
• Periodic sensing and actuation
• Task periodicity  deadlines
EMERALDS
• OS for embedded systems
• Due to hardware constraints
• Want to minimize overhead everywhere
• Use resources on real work, not management
• Focus on three system services
• Task scheduling
• Synchronization
• Communication
Rate-monotonic scheduling
• Each task has
• A periodicity (e.g., must run every 10 ms)
• A worst-case execution time (e.g., 5 ms)
• A static priority (i.e., does not change over time)
• Basic idea
• Assign task priorities based on periodicity
• Tasks with smaller period get higher priority
• Can use pre-emption to interrupt a task
Rate-monotonic scheduling
T1 (Period 100ms, execution time 50ms)
T2 (Period 200ms, execution time 80ms)
Pre-empt and schedule every 50ms
0
50
T1
150
100
T2
T1
180 200
T2
What to run next? What to run next? What to run next?
Did we meet all of our deadlines?
What was our utilization?
Rate-monotonic scheduling
• Rate-monotonic is optimal for fixed-priority
• Maximizes task-set “schedulability”
• Ensures that max number of tasks meet their
deadlines
• Scheduability test for RM (Liu and Layland)
Completion time
m tasks
Task period
Rate-monotonic scheduling
• Rate-monotonic is optimal for fixed-priority
• Maximizes task-set “schedulability”
• Ensures that max number of tasks meet their
deadlines
• Scheduability test for RM (Liu and Layland)
If utilization is below this number, a feasible schedule exists
Rate-monotonic scheduling
• Rate-monotonic is optimal for fixed-priority
• Maximizes task-set “schedulability”
• Ensures that max number of tasks meet their
deadlines
• Scheduability test for RM (Liu and Layland)
for m = 2, utilization < .83
Rate-monotonic scheduling
• Rate-monotonic is optimal for fixed-priority
• Maximizes task-set “schedulability”
• Ensures that max number of tasks meet their
deadlines
As long as
• Scheduability test for RM (Liu and Layland)
utilization is
below 0.69, RM
will meet all
deadlines
for m  ∞, utilization < ln(2) ≈ 0.69
Rate-monotonic scheduling
• Rate-monotonic is optimal for fixed-priority
• Maximizes task-set “schedulability”
• Ensures that max number of tasks meet their
deadlines
• Scheduability test for RM (Liu and Layland)
Leaves roughly
31% CPU for
scheduler and
other stalls
for m  ∞, utilization < ln(2) ≈ 0.69
Rate-monotonic scheduling
T1 (Period 100ms, execution time 50ms)
T2 (Period 200ms, execution time 80ms)
Pre-empt and schedule every 50ms
0
50
T1
150
100
T2
T1
180 200
T2
How is RM different than earliest-deadline first (EDF)?
RM priorities are static, EDF prioritizes task closest to deadline
Scheduling overheads
• Runtime overhead
• Time to decide what to run next
• Want fast access to TCB queues
• Schedulability overhead
• For a given a task set
• Can all tasks in set be processed in time?
Runtime overhead
• EDF
•
•
•
•
Tasks are kept in an unsorted list
Walk the whole list to find one with earliest deadline
O(1) to block/unblock a task
O(n) to schedule next task
• RM
•
•
•
•
•
Tasks are kept in a sorted list
Keep pointer to highest priority ready task
O(n) to block
O(1) to unblock
O(1) to schedule next task
Scheduability overhead
• EDF
• Can schedule all workloads where
1
• Zero scheduability overhead
• RM
• Observed utilizations of 0.88 on average for RM
Rate-monotonic scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
3
4
5
6
0
1
T1
2
T2
T3
T4
T1
T2
7
T3
8
T4
What’s wrong?
What to
What
run to
next?
What
run What
next?
to runtonext?
What
run next?
to
What
run to
next?
What
run What
next?
to runtonext?
run next?
When would T5 run under EDF?
Combined static/dynamic (CSD)
• Hybrid approach
• Very useful for solving many kinds of problems
• Use one approach when it does well
• Use another approach when it does well
• Main idea: two queues
• Dynamic queue (sorted, used by EDF)
• Fixed queue (unsorted, used by RM)
• Key is figuring out which queue to put tasks on
CSD scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
Need to identify longest-period task that fails under RM
DP
Q
FP
Q
T1
T6
T2
T7
T3
T8
T4
T9
T5
T10
Which tasks have higher
priority?
CSD scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
Need to identify longest-period task that fails under RM
DP
Q
T1
T2
T3
T4
T5
FP
Q
T6
T7
T8
T9
T10
When do FPQ tasks run?
CSD scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
3
4
0
1
T1
DP
Q
FP
Q
2
T2
T3
T1
T4
T2
5.5
4.5
T5
T3
T1
T4
6.5
T2
7.5
T3
T5
When would FPQ tasks run?
T6
T7
T8
T9
T10
Need a DPQ task to block
CSD scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
3
4
0
1
T1
DP
Q
FP
Q
2
T2
T3
T1
T6
T4
T2
T7
5.5
4.5
T5
T3
T8
T1
T4
T9
6.5
T2
7.5
T3
T5
T10
What if we have a lot of
tasks?
Time to walk DPQ could
grow
CSD scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
3
4
0
1
T1
DP
Q
FP
Q
2
T2
T3
T1
T6
T4
T2
T7
5.5
4.5
T5
T3
T8
T1
T4
T9
6.5
T2
7.5
T3
T5
T10
Why might this be a
problem?
May start missing deadlines
CSD scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
3
4
0
1
T1
DP
Q
FP
Q
2
T2
T3
T1
T4
T2
5.5
4.5
T5
T3
T1
T4
6.5
T2
7.5
T3
T5
What’s the solution?
T6
T7
T8
T9
T10
Split DPQ into two Qs
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
T2
Which tasks go on which queue?
T3
More frequent tasks in DPQ1
DPQ
2
T4
T5
FP
Q
T6
T7
T8
T9
T10
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
T2
DPQ
2
T4
T5
FP
Q
T6
T7
Why is DPQ1 helpful?
T3
Lower search time for frequent
tasks. Longer searches occur
less frequently.
T8
T9
T10
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
DPQ
2
T4
T5
FP
Q
T6
T7
T2
T3
When do DPQ2 tasks run?
When DPQ1 is empty
T8
T9
T10
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
T2
DPQ
2
T4
T5
FP
Q
T3
When do FPQ tasks run?
When DPQ1, DPQ2 are empty
T6
T7
T8
T9
T10
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
DPQ
2
T4
T5
FP
Q
T6
T7
T2
T3
What is the downside of DPQ2?
Schedualibility suffers
T8
T9
T10
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
DPQ
2
T5
FP
Q
T6
T2
T3
T4
What if DPQ2 only has T5?
We’re back to missing its
deadline
T7
T8
T9
T10
CSD3 scheduling
i
1
2
3
4
5
6
7
8
9
10
Pi
4
5
6
7
8
20
30
50
100
130
ci
1.0
1.0
1.0
1.0
0.5
0.5
0.5
0.5
0.5
0.5
DPQ
1
T1
DPQ
2
T5
FP
Q
T6
T2
T3
T4
What’s the solution?
Exhaustive offline search!
T7
T8
T9
T10
Synchronization
• General approach
• Integrate synchronization with scheduling
• Same approach as 310 thread library
• Main primitive: semaphores
• Initialize w/ value one to use as a lock
• Initizlize w/ value zero to use for ordering (e.g., CV)
Synchronization
• Semaphores w/ priority inheritance for
locking
if (sem locked) {
do priority inheritance;
add calling thread to wait queue;
block;
}
lock sem;
Synchronization
Tx
How do we get rid of this
context switch?
unlock
T1
(lock holder)
T2
Schedule T1 before T2
lock
Synchronization
Tx
What extra info must be
passed to blocking call?
unlock
T1
(lock holder)
T2
ID of semaphore to be
acquired
lock
Synchronization
Tx
What does the scheduler do
before it tries to run T2?
unlock
T1
(lock holder)
T2
Checks if semaphore is held. If
so, transfers T2 priority to T1.
Blocks T2 on semaphore wait list.
lock
Synchronization
Tx
When is T2 unblocked?
unlock
T1
(lock holder)
T2
When T1 releases the lock. This
transfers T2 priority back to T2.
lock
Synchronization
Tx
T1
unlock
(lock holder)
T2
lock
Communication
• Typical access pattern
• One task reading a sensor
• Many tasks processing sensor values
• Single writer, multiple readers
• Could use producer-consumer queue
•
•
•
•
Writer acquires lock on shared queue
Readers block acquiring lock to read queue
Writer adds new value to queue
Readers take turns acquiring lock, reading value
• Too slow for an embedded system
Communication
Note this is for a
uni-processor.
System is
concurrent, but
not parallel.
R0
R1
Reader
R2
R3
Writer
R4
R5
Index = 0
State Message
Reader
Communication
What is the
difference
between
concurrency and
parallelism?
R0
R1
Reader
R2
Parallelism is
when threads
run at the same
Writer time.
Concurrency is
when threads’
execution
overlaps in time.
R3
R4
R5
Index = 0
State Message
Reader
Communication
What operations
are atomic?
R0
R1
Reader
R2
Loads and stores
to access B
Writer bytes.
R3
R4
R5
Index = 0
State Message
Reader
Communication
Do I need this
circular buffer if
my data is ≤ B
bytes?
R0
R1
Reader
R2
No, can just
update with load
or store. Value
won’t be partially
Writer
written. Need
buffers when
data is > B
bytes.
R3
R4
R5
Index = 0
State Message
Reader
Communication
To update SM:
(1) Read new sensor value
(2) Read index into i
(3) Write value to (i+1)%6
(4) Update index to (i+1)%6
R0
R1
R2
R3
What happens if
reader runs
Writer
between 1 and
2?
Reader
To read SM:
(1) Read index into i
(2) Read value at buffer[i]
R4
R5
Index = 0
Sees old value
State Message
Reader
Communication
To update SM:
(1) Read new sensor value
(2) Read index into i
(3) Write value to (i+1)%6
(4) Update index to (i+1)%6
R0
R1
R2
R3
What happens if
reader runs
Writer
between 3 and
4?
Reader
To read SM:
(1) Read index into i
(2) Read value at buffer[i]
R4
R5
Index = 0
Sees old value
State Message
Reader
Communication
To update SM:
(1) Read new sensor value
(2) Read index into i
(3) Write value to (i+1)%6
(4) Update index to (i+1)%6
R0
R1
R2
R3
Writer
R4
R5
Index = 0
State Message
Reader
To read SM:
(1) Read index into i
(2) Read value at buffer[i]
What happens if
writer runs
between 1 and
2?
Sees old Reader
value
Communication
To update SM:
(1) Read new sensor value
(2) Read index into i
(3) Write value to (i+1)%6
(4) Update index to (i+1)%6
R0
R1
R2
R3
Writer
R4
R5
Index = 0
State Message
Reader
To read SM:
(1) Read index into i
(2) Read value at buffer[i]
What happens if
writer runs 6
times between
start of 2 and
end of 2?
Reader will
see
Reader
garbled value
Communication
To update SM:
(1) Read new sensor value
(2) Read index into i
(3) Write value to (i+1)%6
(4) Update index to (i+1)%6
R0
R1
R2
R3
Writer
R4
R5
Index = 0
State Message
Reader
To read SM:
(1) Read index into i
(2) Read value at buffer[i]
What happens if
writer runs 6
times between
start of 2 and
end of 2?
Reader will
see
Reader
garbled value
Communication
• Could make the buffer longer
• Making the buffer too long will waste memory
• Idea
• We know how long read and write tasks take
• Set buffer length based on longest a read could take
d is reader’s period (deadline)
c is reader’s compute time
cr is time to read value
maxReadTime = d – (c – cr)
Communication
• Could make the buffer longer
• Making the buffer too long will waste memory
• Idea
• We know how long read and write tasks take
• Set buffer length based on longest a read could take
xmax is max writes that could garble
values
Pw is writer’s period
dw is writer’s deadline
maxReadTime = d – (c – cr)
xmax > FLOOR((maxReadTime – (Pw – dw))/Pw)
Communication
• Could make the buffer longer
• Making the buffer too long will waste memory
• Idea
• We know how long read and write tasks take
• Set buffer length based on longest a read could take
maxReadTime
W
dw
W
W
pw
maxReadTime = d – (c – cr)
xmax > FLOOR((maxReadTime – (Pw – dw))/Pw)
W
W
Course administration
• Research projects
• Study use of native code in Android apps
• Build a FUSE file system for Linux
Download