Operating System Requirements(§8.5) °Provide protection to shared I/O resources ECE468 Computer Organization and Architecture • Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights °Provides abstraction for accessing devices: OS’s Responsibilities • Supply routines that handle low-level device operation °Handles the interrupts generated by I/O devices °Provide equitable access to the shared I/O resources • All user programs must have equal access to the I/O resources °Schedule accesses in order to enhance system throughput ECE4680 buses.1 April 5, 2003 April 5, 2003 OS and I/O Systems Communication Requirements Recap: Summary of Bus Options: °Option High performance Low cost °Bus width Separate address & data lines Multiplex address & data lines °Data width Wider is faster (e.g., 32 bits) Narrower is cheaper (e.g., 8 bits) °Transfer size Multiple words has less bus overhead Single-word transfer is simpler °Bus masters Multiple (requires arbitration) Single master (no arbitration) °Clocking Synchronous Asynchronous °Protocol pipelined Serial °The Operating System must be able to prevent: ECE4680 buses.2 • The user program from communicating with the I/O device directly °If user programs could perform I/O directly: • Protection to the shared I/O resources could not be provided °Three types of communication are required: • The OS must be able to give commands to the I/O devices • The I/O device must be able to notify the OS when the I/O device has completed an operation or has encountered an error • Data must be transferred between memory and an I/O device April 5, 2003 I/O System Design Issues Processor ECE4680 buses.4 ECE4680 buses.5 April 5, 2003 Giving Commands to I/O Devices °Two methods are used to address the device: interrupts • Special I/O instructions • Memory-mapped I/O °Special I/O instructions specify: • Both the device number and the command word Cache Memory - I/O Bus Main Memory I/O Controller Disk Disk I/O Controller I/O Controller Graphics Network °Memory-mapped I/O: • Portions of the address space are assigned to I/O device • Read and writes to those addresses are interpreted as commands to the I/O devices • User programs are prevented from issuing I/O operations directly: - ECE4680 buses.3 Device number: the processor communicates this via a set of wires normally included as part of the I/O bus Command word: this is usually send on the bus’s data lines April 5, 2003 ECE4680 buses.6 The I/O address space is protected by the address translation April 5, 2003 I/O Device Notifying the OS Interrupt Driven Data Transfer add sub and or nop CPU (1) I/O interrupt °The OS needs to know when: • The I/O device has completed an operation • The I/O operation has encountered an error (2) save PC Memory °This can be accomplished in two different ways: • Polling: - The I/O device put information in a status register IOC (3) interrupt service addr read store ... : rti device - The OS periodically check the status register • I/O Interrupt: - user program (4) Whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. interrupt service routine memory °Advantage: • User program progress is only halted during actual transfer °Disadvantage, special hardware is needed to: • Cause an interrupt (I/O device) • Detect an interrupt (processor) • Save the proper states to resume after the interrupt (processor) ECE4680 buses.7 April 5, 2003 ECE4680 buses.10 Polling: Programmed I/O April 5, 2003 I/O Interrupt(page 678) CPU °An I/O interrupt is just like the exceptions except: Is the data ready? Memory IOC no yes read data busy wait loop not an efficient way to use the CPU unless the device is very fast! but checks for I/O completion can be dispersed among computation intensive code device store data done? • An I/O interrupt is asynchronous • Further information needs to be conveyed °An I/O interrupt is asynchronous with respect to instruction execution: • I/O interrupt is not associated with any instruction • I/O interrupt does not prevent any instruction from completion - You can pick your own convenient point to take an interrupt °I/O interrupt is more complicated than exception: • Needs to convey the identity of the device generating the interrupt • Interrupt requests can have different urgencies: no yes °Advantage: • Simple: the processor is totally in control and does all the work - Interrupt request needs to be prioritized °Disadvantage: • Polling overhead can consume a lot of CPU time ECE4680 buses.8 April 5, 2003 ECE4680 buses.11 Example 1: Overhead of polling(page 676) April 5, 2003 Interrupt Logic °Assume that the number of clock cycles for a polling operation, including transferring to the polling routine, accessing the device, and restarting the user program, is 400 and that the processor executes with a 500MHz clock °Detect and synchronize interrupt requests • Ignore interrupts that are disabled (masked off) • Rank the pending interrupt requests • Create interrupt microsequence address • Provide select signals for interrupt microsequence °Determine the fraction of CPU time consumed for the following three cases, assuming that you poll often enough so that no data is ever lost and assuming that the devices are potentially always busy: • mouse must be polled 30 times per second • floppy disk transfers data to CPU in 16-bit units and has a data rate of 50 KB/sec. • hard disk transfers data four-word chunks and can transfer at 4MB/sec. : uSeq. addr & select logic Interrupt Priority Network Synchronizer Circuits : Async interrupt requests Interrupt Mask Reg • For mouse, fraction = 30× ×400÷ ÷500MHz = 0.002% • For floppy disk, polling rate = 50KB/s ÷2=25K/s, fraction = 25k ×400÷ ÷500MHz =2% • For hard disk, polling rate = 4MB/s ÷16=250K/s, fraction = 250k ×400÷ ÷500MHz =20% Sync. Inputs Q • So for quicker I/O devices, polling wastes more CUP time. What is worse, even when I/O is not busy. CPU still keeps polling. ECE4680 buses.9 April 5, 2003 D Q Clk ECE4680 buses.12 Async. D Inputs Clk April 5, 2003 Program Interrupt/Exception Hardware Delegating I/O Responsibility from the CPU: DMA CPU sends a starting address, direction, and length count to DMAC. Then issues "start". °Hardware interrupt services: • Save the PC (or PCs in a pipelined machine) • Inhibit the interrupt that is being handled °Direct Memory Access (DMA): - CPU • External to the CPU • Act as a maser on the bus • Branch to interrupt service routine • Options: Save status, save registers, save interrupt information Change status, change operating modes, get interrupt info. • Transfer blocks of data to or from memory without CPU intervention Memory °A “good thing” about interrupt: • Asynchronous: not associated with a particular instruction • Pick the most convenient place in the pipeline to handle it DMAC IOC device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. ECE4680 buses.13 April 5, 2003 Programmer’s View main program Add Div Sub ECE4680 buses.16 April 5, 2003 Example 3: Overhead of DMA(page 681) interrupts request (e.g., from keyboard) (1) (2) Save PC and “branch” to interrupt target address Save processor status/state Service the (keyboard) interrupt °Consider the same hard disk and processor in Examples 1 and 2: • hard disk transfers at 4MB/sec. • the processor executes with a 500MHz clock °Assume the initial setup of a DMA transfer takes 1000 clock cycles °the handling of the interrupt at DMA completion requires 500 cycles °Determine the fraction of CPU time consumed if the hard disk is actively transferring 100% of the time and the average transfer from the disk is 8KB Restore processor status/state (3) get PC • DMA rate is = 4MB/s ÷ 8kB= 500/s fraction when 100% busy = (1000+500) ×500÷ ÷500MHz =0.2% °Interrupt target address options: • General: Branch to a common address for all interrupts Software then decode the cause and figure out what to do • Specific: Automatically branch to different addresses based on interrupt type and/or level--vectored interrupt ECE4680 buses.14 • So MDA is even better than interrupt-driven. Reason? April 5, 2003 Example 2: Overhead of Interrupt (page 679) ECE4680 buses.17 April 5, 2003 Delegating I/O Responsibility from the CPU: IOP °Consider the same hard disk and processor in Example 1: CPU • hard disk transfers data four-word chunks and can transfer at 4MB/sec. • the processor executes with a 500MHz clock Mem D2 main memory bus °Assume that the overhead for each transfer, including the interrupt, is 500 clock cycles . . . Dn I/O bus °Determine the fraction of CPU time consumed if the hard disk is only transferring data 5% of the time. (1) Issues instruction to IOP • Interrupt rate is the same as polling rate = 4MB/s ÷16=250K/s, D1 IOP fraction when 100% busy = 250k ×500÷ ÷500MHz =25% fraction when 5% busy = 25%× ×5% = 1.25% CPU IOP (3) OP Device Address (4) IOP interrupts CPU when done (2) IOP steals memory cycles. ECE4680 buses.15 April 5, 2003 ECE4680 buses.18 IOP looks in memory for commands OP Addr Cnt Other memory Device to/from memory transfers are controlled by the IOP directly. • So interrupt-driven is much better than polling. Reason? target device where cmnds are special requests what to do where to put data how much April 5, 2003 Where are we visited? Summary: Input Multiplier Input Multiplicand 32 Load Mp 32=>34 signE x <<1 34 32=>34 signEx 1 32 34 0 34x2 MUX Multi x2/x1 Arithmetic 34 34 34-bi t ALU Su b/Add Control Log ic 32 2 LoadHI LO register (16x2 bits) 32 Result[HI] Prev 2 Booth Encoder • Backplane buses ShiftAll HI register (16x2 bits) LO[1] 2 "L O[ 32 0]" 34 Extra 2 bi ts • Processor-memory buses • I/O buses Multiplicand Register Single/multicycle Datapaths LoadLO Performance evaluation Cl earHI °Three types of buses: ENC[2] ENC[1] ENC[0] 2 LO[1:0] 32 Result[LO] °Bus arbitration schemes: • Daisy chain arbitration: it cannot assure fairness 1000 “Moore’s Law” IFetch Dcd Exec Mem WB • Polling: it can waste a lot of processor time Performance 10 DRA M 9%/yr. DRA M(2X/ 10 yrs) 1 198 01 98 198 1 2198 198 3 1 98 4 1598 1 98 6 7198 1 98 8 199 9 1 99 0 199 1 1299 3199 1 99 4 1 99 5 1699 199 7 1 99 8 9200 0 ECE4680 Winter 2003 • Centralized parallel arbitration: requires a central arbiter °I/O device notifying the operating system: µProc CPU 60%/yr. (2X/ 1.5 yr) Proce ssor-Memory Performance Gap: (grow s 50% / year) 100 IFetch Dcd Exec Mem WB Time IFetch Dcd Exec Mem WB • I/O interrupt: similar to exception except it is asynchronous IFetch Dcd Exec Mem WB °Delegating I/O responsibility from the CPU Pipelining • Direct memory access (DMA) • I/O processor (IOP) I/O Memory Systems ECE4680 buses.19 April 5, 2003 ECE4680 buses.22 Beyond ECE4680 ECE4680: Objectives and Assessment °In-depth understanding of the inner-workings of modern computers, their evolution, and trade-offs present at the hardware/software boundary. Application • Insight into fast/slow operations that are easy/hard to implementation hardware Compiler April 5, 2003 The Big Picture Processor Input Control Memory ECE4680 buses.21 Instruction Set Architecture Digital Design Circuit Design • Functional Spec Control & Datapath Physical implementation Datapath Operating System Instr. Set Proc. I/O system °Experience with the design process in the context of a large complex (hardware) design. ECE4680 buses.20 April 5, 2003 Output April 5, 2003 ECE4680 buses.23 April 5, 2003 ℵ