powerpoint slides [1.4M]

advertisement
Predictable Scheduling for a
Soft Modem
Michael B. Jones – Microsoft Research
Stefan Saroiu – University of Washington
1
Consumer Real-Time
• General-purpose Operating Systems,
such as Windows 2000:
– maximize aggregate throughput
– approximate fair sharing of the resources
• Increasing use of time-dependent tasks
– signal processing, audio, video
• Need support for:
– predictable scheduling for independently
developed applications
– low latency responses
– explicit resource allocation mechanisms
2
Why Study Soft Modems ?
• Signal Processing done on host CPU:
– requires predictable scheduling
– requires low latency responses
• While coexisting with other system
activities
– Soft Modem is a background real-time task
• Successful in home computer market:
– Low cost
– Easy to update – software upgrade
3
Methodology
• Instrumented Windows 2000 performance
kernel:
– Logs predefined and custom events
– Writes them to a memory buffer
– Dumps buffers to disk at end of trace
• Driver Software:
– No source for signal processing code
• Measurement Environment:
– All experiments run with normal-priority spinning
competitor thread
• System:
– Windows 2000 Professional
– Pentium II 450 MHz (uniprocessor)
– 384 MB ECC SDRAM - 100 MB allocated to logging
4
Vendor Driver - Signal
Processing in Interrupt (INT)
• Operation of the modem:
– 1. DMA transfers between A/D and D/A and
physical memory
– 2. When enough data samples, the modem raises an
interrupt
– 3. Inside ISR, process incoming data and provide
outgoing samples, before buffers exhausted
• Uses input and output data buffers holding
512 16-bit samples (1024 bytes/buffer)
5
Three Additional Versions
• DPC Version (DPC)
– The ISR queues a DPC
– DPC performs signal processing
• Thread Version (THR)
– The ISR queues a DPC that signals a thread via
a semaphore
– Thread performs signal processing
– Experimented with several different priorities
• Rialto/NT Version (RES)
– Same as THR, but thread scheduled using
Rialto/NT real-time periodic CPU Reservation
6
Interrupt Rate
3 different phases, interrupts very regular
Rate of Interrupts (INT)
Dialing
Training
On-hook
Connected
35
Milliseconds
30
25
20
15
10
5
0
0
5
10
15
20
25
30
Time (seconds)
Falls within PC 99 recommended interrupt rates of 3-16ms
7
Elapsed Times in ISR (INT)
1.8 ms with repeatable worst case of 3.3 ms
Elapsed Times in Interrupt Handler (INT)
3.5
On-hook
Dialing
Training
Connected
Milliseconds
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
Time (seconds)
PC 99 recommends maximum time during which a driver-based
modem disables interrupts should not exceed 100 µs
8
CPU Utilization
14.7% sustained load on 450MHz Pentium II
CPU Load
On-hook
35%
Dialing
Training
Connected
CPU Load
30%
25%
20%
15%
10%
5%
0%
0
5
10
15
20
25
30
Time (seconds)
9
Elapsed Times in ISR (DPC)
ISR times now small, typically < 6µs
Elapsed Times In Interrupt Handler (DPC)
On-hook
16
Dialing
Training
Connected
Microseconds
14
12
10
8
6
4
2
0
0
5
10
15
20
25
30
Time (seconds)
10
Elapsed Times in Queued DPC
But now long DPC times: 1.8ms avg., 3.3 max
(same as elapsed times in ISR for INT)
Elapsed Times In Queued DPC (DPC)
On-hook
3.5
Dialing
Training
Connected
Milliseconds
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
Time (seconds)
PC 99 recommends that the total execution time required
for all queued DPCs should not exceed 500 µs
11
Samples Pending to be Processed
(INT & THR 24)
Small relative to 512 sample buffer size
Samples Pending to be Processed (INT)
On-hook
Unprocessed Samples
35
Dialing
Training
Connected
30
25
20
15
10
5
0
0
5
10
15
20
25
30
Time (seconds)
Samples Pending to be Processed (THR 24)
35
Unprocessed Samples
Dialing
On-hook
Training
Connected
30
25
20
15
10
5
0
0
5
10
15
20
25
30
Time (seconds)
12
Samples Pending to be
Processed (THR 8)
Unsurprisingly, contention kills modem
Samples Pending to be Processed (THR 8)
On-hook
Unprocessed Samples
600
Dialing
"Please hang up and try your call again"
500
400
300
200
100
0
0
5
10
15
20
25
30
35
Time (seconds)
13
Latency Results
• Set the multimedia timers to fire once
every millisecond
• Register a routine to be called every
millisecond
• Routine does very little work
– Stores cycle counter value and sleeps again
• Histograms show differences between
recorded times and ideal times
14
Coexisting Thread Latencies
(Control Case - No Modem)
Maximum 1978µs between wakeups
Control Case - No Modem
96.8%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
50
10
0
15
0
40
0
85
0
90
0
95
0
10
00
10
50
11
00
18
50
19
00
19
50
20
00
Percentage of Callbacks
3.0%
Latency (microseconds)
15
Coexisting Thread Latencies
(INT)
Maximum 5313µs between wakeups
INT Version
83.1%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
50
30
0
55
0
80
0
10
50
13
00
15
50
18
00
20
50
23
00
25
50
28
00
30
50
33
00
35
50
38
50
53
50
Percentage of Callbacks
3.0%
Latency (microseconds)
16
Coexisting Thread Latencies
(DPC)
Maximum 4396µs between wakeups
DPC Version
82.6%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
50
30
0
55
0
80
0
10
50
13
00
15
50
18
00
20
50
23
00
25
50
28
00
30
50
33
50
39
50
Percentage of Callbacks
3.0%
Latency (microseconds)
17
Coexisting Thread Latencies
(THR 24)
Maximum 2239µs between wakeups
THR Version (24)
93.8%
2.5%
2.0%
1.5%
1.0%
0.5%
95
0
10
50
11
50
16
50
19
00
20
00
21
00
85
0
75
0
35
0
15
0
0.0%
50
Percentage of Callbacks
3.0%
Latency (microseconds)
18
What Have We Learned So Far?
• Signal processing in the context of the
interrupt handler is:
– unnecessary
– detrimental to the latencies and predictability of
coexisting activities
• Vendor choice understandable
– For any priority there is a potentially unbounded
delay between the interrupt and the thread
running
• In practice
– Delays are reasonable for well-configured systems
[Intel OSDI ’99]
– Using interrupts extreme form of priority inflation
19
Two Possible Solutions
• Rate Monotonic Analysis – determine the
“right” priority assignments among all
threads. Two problems:
– Assumes cooperative priority assignment among
all threads - unrealistic
– Working priority assignment dependent upon
timing requirements of all threads
+ Changes in application mix may require changes in
priority assignments
• Use a time-based real-time scheduler
– Such as Rialto/NT
20
Rialto/NT Abstractions
• Two real-time software abstractions:
– CPU Reservations – ongoing reservation
for at least X time units out of every Y
units for a thread
– Time Constraints – one-shot time
reservation for specified amount of
work between start time and deadline
• The Soft Modem work only uses CPU
Reservations
21
Rialto/NT Implementation
• Rialto/NT developed on top of
Windows 2000 priority scheduler
• Limitations:
– CPU Reservations must be integer
multiples of milliseconds
– Frequency of reservations must be
power-of-two multiple of 1ms
22
Samples Pending to be Processed
(RES 2ms/8ms – 25%)
Fits well within 512-sample buffer size
Samples Pending to be Processed (RES 2ms/8ms)
Unprocessed Samples
160
On-hook
Dialing
Training
Connected
140
120
100
80
60
40
20
0
0
5
10
15
20
Time (seconds)
25
30
35
23
Coexisting Thread Latencies
(RES 2ms/8ms – 25%)
Maximum 1971µs between wakeups
85.5%
7.0%
6.0%
5.0%
4.0%
3.0%
2.0%
1.0%
95
0
10
00
10
50
11
00
11
50
18
50
19
00
19
50
20
00
90
0
20
0
15
0
0.0%
10
0
Percentage of Callbacks
RES Version (2ms/8ms)
Latency (microseconds)
24
File Transfer Times
Results for 10 copies of 200,000 bytes each
INT
DPC
THR Pri 24
RES 1ms/7ms
RES 2ms/13ms
RES 2ms/14ms
RES 3ms/15ms
RES 3ms/16ms
RES 4ms/16ms
RES 8ms/20ms
Min
36.334
36.272
36.319
36.333
36.288
38.631
36.275
97.289
36.255
36.347
Max
Mean Std Dev Passed
36.398 36.367 0.029
10
36.447 36.396 0.048
10
36.475 36.384 0.056
10
36.724 36.426 0.112
10
36.975 36.547 0.232
10
91.713 65.172 37.535
2
36.586 36.387 0.108
10
180.415 110.523 26.408
9
37.116 36.415 0.256
10
36.476 36.394 0.039
10
For 1/8, 2/15, 3/17, 4/17, 7/20 no test passed25
Modem Reservation Ranges
Sensitivity to both percentage and gaps
Reservation Amount (ms)
Modem Reservation Operating Ranges
10
9
8
7
6
Sufficient
CPU Percentage
and Frequency
5
4
3
2
1
0
Gaps
Too
Long
Insufficient Percentage
0
2
4
6
8
10 12 14 16
18 20 22
Reservation Period (ms)
Sufficient
Marginal
Insufficient
Actual
14.7% of CPU
12.5ms Gaps
If period < 12.5ms, must get 14.7% to work
If period > 12.5ms, (period – amount) >= 12.5ms
must also hold
26
Conclusions
• Signal Processing in interrupt context is:
– Unnecessary
– Detrimental to the predictability and latencies of
the coexisting activities
• The DPC version has similar problems
• Threads help alleviate these problems
– Modem runs well with real-time priorities and nonreal-time competition
– However modem threads may interfere with other
threads
• Real-time scheduler allows
– Control over modem’s degree of interference with
other time-sensitive activities
– Performance isolation for threads using reservations
27
Industry Perspective
• Vendor did build their own THR version
– Worked fine during normal load
– However, modem was starved when
+ copying data between two IDE devices
+ using USB scanner (Intel 440BX chipset) that turned
off interrupts for 30-50 ms
– Therefore they shipped the INT version
• Vendor is willing to be a “good citizen”
– if ensured that others would be as well
• Systematic latency timing verification of
components is needed to enforce good
behavior
28
Soft DSL is Coming
• More demanding than soft modems
– 4ms processing period
• G.lite
– 1.531Mbps downstream and 512Kbps upstream
– ~ 25% of a 600 MHz Pentium III
• Full rate DSL
– 3.062Mbps downstream and 512Kbps upstream
– Nearly 50% of a 600 MHz Pentium III
• Soft Bluetooth period 312.5µs
29
Further Research Possibilities
• Soft DSL studies
• Multiple soft devices within the
same machine
• Similar studies on multiprocessors
30
For More Information
• See the authors:
– Mike Jones
+ mbj@microsoft.com
+ http://research.microsoft.com/~mbj/
– Stefan Saroiu
+ tzoompy@cs.washington.edu
+ http://www.cs.washington.edu/homes/tzoompy/
• See related papers at Mike’s web site
31
Download