Chapter 8

advertisement
Chapter 8
Interfacing Processors and
Peripherals
Performance Analysis of
Synchronous vs. Asynchronous
• Compare the maximum bandwidth for a
synchronous and an asynchronous bus:
– synchronous bus: clock cycle=50ns, each bus
transmission takes 1 clock cycle.
– asynchronous bus: 40ns per handshake
– bus width = 32 bits
• Find the bandwidth for each bus when
performing one-word reads from a 200ns
memory.
Synchronous Bus
•
•
•
•
•
Send the address to memory: 50ns
Read the memory: 200ns
Send the data to the device: 50ns
Total time= 300ns
Bandwidth = 4bytes/300ns = 13.3 MB/sec
Asynchronous Bus
• Step 1: 40ns
• Step 2,3,4: max(3x40ns, 200ns) = 200ns
(steps 2,3,4 can be overlapped with memory
access)
• Step 5,6,7 3x40ns =120ns
• Total=360ns
• Bandwidth = 4bytes/360ns = 11.1 MB/sec
Performance Analysis of Two
Bus Schemes
• Given a system with
– a memory and bus system supporting block
access of 4 to 16 words
– a 64-bit synchronous bus clocked at 200MHz,
with each 64-bit transfer taking 1 clock cycle,
and 1 clock cycle to send an address to memory
– two clock cycles needed between each bus
operation
– memory access for first 4 words takes 200ns,
each additional set of 4 words requires 20ns
Question
• Find the sustained bandwidth and latency
for a read of 256 words for transfers using
4-word blocks and 16-word blocks.
• Find the effective number of bus
transactions for each case.
4-Word Block Transfer
• 1 clock cycle to send address to memory
• 200ns/(5ns/cycle) = 40 cycles to read
memory
• 2 cycles to send data from memory
• 2 idle cycles
• Total = 45 cycles
• 256 words requires 45x64= 2880 cycles
4-Word Block Transfer
• Latency = 2880 cycles x 5ns/cycle = 14400
ns
• Number of bus transactions = 64 x
1s/14400ns = 4.44M transactions/s
• Bandwidth = (256x4 bytes)x 1/14400ns =
71.11 MB/s
16-Word Block Transfer
• 1 clock cycle to send address to memory
• 40 cycles to read first 4 words from memory
• 2 cycles to send data, during which the read of
the next 4 words is started.
• 2 idle cycles between transfers, during which
the read of the next block is completed.
• Need to repeat the last two steps 3 times to
read a total of 16 words.
16-Word Block Transfer
• Total cycles required = 1 + 40 + 4x(2+2) =57
cycles
• 256/16=16 transactions are required
• Total number of cycles required for 256 word
= 16x57 = 912 cycles, latency = 4560 ns
• Number of bus transactions = 16 x 1s/4560ns
= 3.51M transactions/s
• Bandwidth = (256x4 bytes)x 1/4560ns =
224.56 MB/
Bus Standards
•
•
•
•
PCI ( a general purpose backplane bus)
SCSI (Small Computer System Interface)
IEEE 1394 (Firewire)
USB 2.0
Interfacing I/O Devices
• How is a user I/O request transformed into a
device command and communicated to the
device?
• How is data actually transferred to or from a
memory location?
• What is the role of the operating system?
Role of the OS
• The OS plays a major role in handling I/O,
in that:
– I/O system is shared by multiple programs
using the processor
– I/O system often use interrupts (cause transfer
to supervisor mode)
– low-level control of I/O is complex
Communications between OS
and I/O Devices
• The OS must be able to give commands to
I/O.
• The I/O must be able to notify the OS when
operation is completed or error has occurred.
• Data must be transferred between memory
and an I/O device.
Giving Commands to I/O
• To give a command, the processor must be
able to address the device and to supply
command words:
– memory-mapped I/O: portions of the address
space is assigned to I/O devices
– special I/O: dedicated I/O instructions in the
processor.
Communicating with the Processor
• Polling: processor periodically checks the
status of I/O.
• Overhead of polling in an I/O system
– Example 1: mouse
– Example 2: floppy disk
– Example 3: hard disk
Mouse
• Assume the number of clock cycles for a
polling operation, including transferring to
the polling routine, accessing the device,
and restarting the user program, is 400, with
a 500 MHz clock.
• The mouse must be polled 30 times a
second to ensure that no user movement is
missed.
• Fraction of CPU time = 30x400/(500x10^6)
= 0.002%
Floppy Disk
• The floppy disk transfers data to the processor
in 16-bit units and has a data rate of 50KB/s.
• Polling rate = (50KB/s)/(2 Bytes/polling)
= 25K polling/sec
• Fraction of CPU time = 25Kx400/(500x10^6)
= 2%
Hard Disk
• Transfer in 4-word blocks
• transfer rate: 4MB/s
• Polling rate = (4MB/s)/(4x4 Bytes/polling)
= 250K polling/sec
• Fraction of CPU time = 250Kx400/(500x10^6)
= 20%
Overhead of Polling
• Can do the polling only when the device is
active, thus reducing the overhead.
• However, the overhead is still significant,
resulting in another design called interruptdriven I/O.
Overhead of Interrupt-Driven I/O
• Assume the overhead for each transfer,
including the interrupt, is 500 cycles.
• Cycles per second for disk = 250Kx500
= 125x10^6 cycles
• Fraction of processor consumed =
125x10^6/(500x10^6) = 25%
• Assuming disk is transferring data 5% of
the time, fraction of CPU on average =
25%x5%=1.25%
Direct Memory Access(DMA)
• If disk is transferring data most of the time,
the overhead for interrupt-driven I/O is still
high.
• For high-bandwidth device, let the device
controller transfer data directly to or from the
memory without involving the processor,
known as direct memory access.
• Interrupt is used to signal the completion of
I/O transfer or error.
Overhead of I/O Using DMA
• Assume initial setup of DMA transfer takes
1000 cycles, handling of interrupt at DMA
completion takes 500 cycles, average transfer
from disk is 8KB
• Each DMA transfer takes 8KB/(4MB/s) =
2x10^-3s
• If the disk is constantly transferring data, it
requires: (1000+500)/(2x10^-3) = 750x10^3
cycles
• Fraction of CPU time= 750x10^3/(500x10^6) = 0.15%
I/O System Design
• Latency constraints: ensuring the latency to
complete and I/O operation is bounded.
• Bandwidth constraints
• Performance Analysis techniques:
— queuing theory
— simulation
— analysis
I/O System Design- Example
• CPU: 300 MIPS, average 5000 instructions
in the OS per I/O operation
• backplane bus transfer rate: 100 MB/s
• SCSI-2 controller with transfer rate = 20
MB/s, accommodating up to 7 disks
• Disk bandwidth = 5MB/s, seek+rotational
latency=10ms
• Workload: 64-KB reads, user program need
100000 instructions per I/O
Example
• Find
– the maximum sustainable I/O rate
– the number of disks and SCSI controller
required.
Download