User Mode Execution Computer Architecture - Overview processor architecture able to use all of the “normal” instructions – privileged execution modes – load and store general registers from/to memory – asynchronous exceptions (traps) – arithmetic, logical, test, compare, data copying – branches and subroutine calls I/O architecture able to address some subset of memory – busses, controllers, devices, smart controllers – I/O: direct, polled, mapped, DMA, interrupt driven – sequential and random access devices – disks and factors affecting disk I/O performance – I/O operations, update the MMU You need to understand how these really work – interrupt enables, enter supervisor mode computer and I/O architecture – what is controlled by a Memory Management Unit not able to perform privileged operations 3/5/03 - 1 computer and I/O architecture Supervisor Mode Execution Processor Status Register can execute privileged instructions contains condition codes – able to perform I/O operations – set by arithmetic/logical operations (0,+,-,ovflo) – interrupt enable/disable/return, load PS – tested by conditional branch instructions – instructions to change processor mode controls execution mode (user/supervisor) can access privileged address spaces – access data structures inside the OS – access other process's address spaces – change and create address spaces describes which interrupts are enabled may describe what address space to use may control other processor features/options may have alternate registers, alternate stack computer and I/O architecture 3/5/03 - 2 3/5/03 - 3 – word length, endian-ness, instruction set, ... computer and I/O architecture 3/5/03 - 4 Choice of Execution Modes Asynchronous Exceptions and Handlers computer boots up in supervisor mode – most errors can be handled “in-line” used by bootstrap and OS to initialize the system applications run in user mode – OS changes to user mode before running user code user programs cannot do I/O, restricted address space – they have no way to get into supervisor mode because instructions to change the PS are privileged reentering supervisor mode is strictly controlled – only happens in response to traps and interrupts computer and I/O architecture 3/5/03 - 5 Trap Handling st 1 level trap handler (saves registers and selects 2nd level handler) PS/PC PS/PC PS/PC PS/PC PS/PC TRAP vector table return to user mode 2nd level handler (actually deals with the problem) computer and I/O architecture – program can test for, and handle such conditions some errors must interrupt program execution – e.g. CPU was unable to execute this instruction – there must be a way to inform OS if this happens most computers accomplish this with “traps” – a well specified list of all possible exceptions – a means for the OS to associate handlers with each computer and I/O architecture 3/5/03 - 6 hardware trap handling ... instr; instr; instr; bad instr; instr; instr; instr ... supervisor mode arithmetic overflows are reflected in condition codes (Transition into Supervisor Mode) Application Program user mode – 3/5/03 - 7 – use trap cause to index into trap vector table for PC/PS – load new processor status word, switch to supv mode – push PC/PS of program that caused trap onto stack – load new program counter (w/addr of 1st level handler) software trap handling – 1st level handler pushes all other registers onto stack – 1st level handler gathers info, selects 2nd level handler – 2nd level handler deals with the exception condition computer and I/O architecture 3/5/03 - 8 Control of Supervisor mode transitions all user->supervisor changes are via traps/interrupts – it is difficult to know when these will happen there is a designated handler for each trap/intr – its address is stored in a trap/interrupt vector table – the operating system sets up all of the handler vectors ordinary programs can't access these vectors – vectors are not in the process' address spaces by carefully controlling all of the trap/intr “gateways” computer and I/O architecture some exceptions are handled by the OS – e.g. page faults, alignment, floating point emulation – OS simulates expected behavior and returns some exceptions may be fatal to running task – e.g. zero divide, illegal instruction, invalid address – OS reflects the failure back to the running process some exceptions may be fatal to the system the OS controlls all supervisor mode transitions – Dealing with the cause of a trap 3/5/03 - 9 – e.g. power failure, cache parity, stack violation – OS cleanly shuts down the affected hardware computer and I/O architecture (Returning to User Mode) Stacking and unstacking a trap user mode computation return is opposite of interrupt/trap entry supervisor mode stack user mode stack growth user-mode PC and PS saved user-mode registers parameters to 2 nd level handler – 2nd level system call handler returns to 1st level handler – 1st level handler restores all registers from stack – use privileged return instruction to restore PC/PS – resume user-mode execution after trapped instruction saved registers can be changed before return return PC stack frame for 2 nd level handler ... computer and I/O architecture 3/5/03 - 10 3/5/03 - 11 – used to set entry point for newly loaded programs – used to deliver signals to user-mode processes – used to set return codes from system calls computer and I/O architecture 3/5/03 - 12 Traps while in Supervisor Mode I/O architectures: busses nearly identical to traps while in user mode – trap saves interrupted PC/PS on supervisor mode stack – trap goes to same vector & 1st level handler – same register saving, restoring, and return there are very few differences control data address interrupts main bus – saved PS at time of interrupt shows supervisor mode – 2nd level handler knows trap was from supervisor mode (and may consider it to be more or less severe than the same trap from user mode) computer and I/O architecture Controller CPU 3/5/03 - 13 Memory Controller Device computer and I/O architecture Memory type busses 3/5/03 - 14 Network type busses came from back-plane memory-to-CPU interconnects evolved as peripheral device interconnects – a few “bus masters”, and many “slave devices” – SCSI, USB, 1394 (firewire), Infiniband, ... – arbitrated multi-cycle bus transactions – cables and connectors rather than back-planes request, grant, address, respond, transfer, ack – designed for easy and dynamic extensibility operations: read, write, read/modify/write, interrupt – originally slower than back-plane, but no longer originally most busses were of this sort much more similar to a general purpose network – ISA, EISA, PCMCIA, PCI, cPCI, video busses, ... – distinguished by form-factor, speed, data width, ... – newer busses support bridging, hot-swap, self-identifying computer and I/O architecture 3/5/03 - 15 – packet switched, topology, routing, node identity – may be master/slave (USB) or peer-to-peer (1394) – may be implemented by controller or by host computer and I/O architecture 3/5/03 - 16 I/O architectures: devices & controllers mechanisms: device controller registers I/O devices device controllers export registers to the bus – peripheral devices that interface between the computer and other media (disks, tapes, networks, serial ports, keyboards, displays, pointing devices, etc.) device controllers connect a device to a bus – communicate control operations to device – relay status information back to the bus – manage DMA transfers for the device – generate interrupts for the device FER DCD PER RI reading from registers obtains data/status may require special instructions (e.g. x86 IN/OUT) may be mapped onto bus like memory accessed with normal (load/store) instructions I/O address space not accessible to most processes computer and I/O architecture 3/5/03 - 18 (16550 UART registers) Register Data Register Interrupt Enable Register Interrupt Register Line Control Register RTS Modem Control Register RER Line Status Register CTS Modem Status Register A 16550 presents seven 8-bit registers to the bus. 0: data – read received byte, write to transmit a byte (or LSB of speed divisor when speed set is enabled) 1: interrupt enables – for transmit done, data received, cd/ring (or MSB of speed divisor when speed set is enabled) 2: interrupt registers – currently pending interrupt conditions 3: line control register – character length, parity and speed 4: modem control register – control signals sent by computer All communication between the bus and the device (send data, receive data, status and control) is performed by reading from, and writing to these registers. computer and I/O architecture – privileged instructions restricted to supervisor mode A simple device: 16550 UART BRK writing into registers controls device or sends data – 3/5/03 - 17 DTR OVR DSR – – computer and I/O architecture contents x x x x x MDM STS XMT RCV MDM STS XMT RCV PARITY STOP WORDLEN registers in controller can be addressed from bus register access method varies with CPU type a controller is usually specific to a device and a bus offset 0 x x x 1 2 3 spee BRK d 4 5 RCV EMT XMT 6 – 3/5/03 - 19 5: line status register – xmt/rcv completion and error conditions 6: modem status registers – received modem control signals computer and I/O architecture 3/5/03 - 20 Scenario: direct I/O with polling (mechanisms: direct polled I/O) all transfers happen under direct control of CPU uart_write_char( char c ) { while( (inb(UART_LSR) & TR_DONE) == 0); outb( UART_DATA, c ); } – CPU transfers data to/from device controller registers – transfers are typically one byte or word at a time – may be accomplished with normal or I/O instructions CPU polls device until it is ready for data transfer char uart_read_char() { while( (inb(UART_LSR) & RX_READY) == 0); return( inb(UART_DATA) ); – received data is available to be read – previously initiated write operations have been completed advantages – } computer and I/O architecture 3/5/03 - 21 performance of direct I/O each byte or word transferred requires mutiple instructions busy-wait polling ties up CPU until I/O is completed devices are idle while we are running other tasks – 3/5/03 - 22 bus facilitates data flow in all directions between CPU is wasted while awaiting completion of transfers – computer and I/O architecture Direct Memory Access – I/O w/o the CPU CPU intensive data transfers – very easy to implement (both hardware and software) – CPU, memory, and device controllers CPU can be the bus-master – initiating data transfers with memory or device controllers device controllers can also master the bus I/O can only happen when an I/O task is running – how can problems be dealt with CPU instructs controller what transfer is desired what data to move to/from what part of memory – let controller transfer data without attention from CPU – device controller performs transfer w/o CPU assistance – let application block pending I/O completion – device controller generates interrupt at end of transfer – let controller interrupt CPU when I/O is finally done computer and I/O architecture 3/5/03 - 23 computer and I/O architecture 3/5/03 - 24 completion interrupts – waking up CPU Interrupt Handling Application Program device controllers, busses, and interrupts ... instr; instr; instr; instr; instr; instr ... busses have ability to send interrupts to the CPU user mode – devices signal controller when they are done/ready supervisor mode – when device is done, controller asserts interrupt on bus CPUs and interrupts – 1st level interrupt handler interrupts look very much like traps PS/PC PS/PC PS/PC PS/PC return to user mode PS/PC Interrupt vector table traps come from CPU, interrupts are caused externally – unlike traps, interrupts can be selectively enabled/disabled 2nd level handler (device driver interrupt routine) a device can be told it can or cannot generate interrupts special instructions can enable/disable interrupts to CPU computer and I/O architecture 3/5/03 - 25 interrupts vs. traps – they are triggered when something happens – there is (usually) no persistent state that must be cleared interrupts are caused a device being in some state – they are triggered when the device enters a particular state – they will continue to be asserted until device state changes lock(devlock); /* lock device */ /* update data read count */ /* program the DMA request */ req_xfr = req_cnt – dp->cnt; dp->loc = req_loc; dp->adr = req_adr; dp->cnt = req_cnt; dp->op = READ; /* turn off device ability to interrupt */ dp->ctrl = IDISABLE; dp->ctrl = IENABLE | GO; /* wake up the requester */ wakeup(devcompletion); intr_enable( save ); once delivered, an interrupt must be disabled await(devcompletion); /* tell intr dispatcher we're done */ CPU must ignore continuing request for that interrupt /* request has completed */ cause must be cleared, and interrupt acknowledged unlock(devlock); computer and I/O architecture dev_intr_handler() { save = intr_enable(DISABLE); /* re-enable and await completion */ the device is changed from DONE to BUSY again – 3/5/03 - 26 DMA read w/completion interrupts traps are caused by an instantaneous condition – list of device interrupt handlers computer and I/O architecture 3/5/03 - 27 /* release device */ computer and I/O architecture return( ACKNOWLEDGE_INTERUPT) } 3/5/03 - 28 (device I/O with completion interrupts) mechanisms: memory mapped I/O requesting process checks to see if device is busy DMA may not be the easiest way to do I/O – if idle, start the I/O operation, and await its completion – if busy, wait for the device to become idle I/O interrupt handler – gathers completion information from the device – posts completion awakening requester wake up the next requester – continuous updates to isolated areas of the screen – 1MB display controller sits on the CPU memory bus – each byte of display memory corresponds to one pixel – application uses ordinary stores to update display low overhead per update, no interrupts to service we'll talk about waiting and waking up in two weeks computer and I/O architecture consider a video game display adaptor implement as a bit-mapped display adaptor when current device owner finishes using the device – – 3/5/03 - 29 relatively easy to program computer and I/O architecture trade-off: memory mapped vs. DMA 3/5/03 - 30 Smart Device Controller DMA performs large transfers efficiently – better utilization of both the devices and the CPU I/O completion interrupts device doesn't have to wait for CPU to do transfers – I/O instructions but there is considerable per transfer overhead setting up the operation, processing completion interrupt memory-mapped I/O has no start/finish overhead – device driver basic status basic control accessed through bus control registers (on bus) buffer pointers device controller but every byte is transferred by a CPU instruction normal instructions DMA better for occasional large transfers accessed through DMA memory-mapped better frequent small transfers memory-mapped devices are more difficult to share computer and I/O architecture 3/5/03 - 31 shared buffers (in memory) computer and I/O architecture 3/5/03 - 32 Random v.s. Sequential Access (I/O Mechanisms: smart controllers) Smarter controlers can improve on basic DMA they can queue multiple input/output requests – when one finishes, automatically start next one – reduce completion/start-up delays – eliminate need for CPU to service interrupts request scheduling to improve perormance – they can do automatic error handling & retries they can better hide the details of underlying devices computer and I/O architecture – byte/block N must be read before byte/block N+1 – may be read/write once, or may be rewindable – examples: magnetic tape, printer, keyboard Random access devices they can relieve CPU of other I/O responsibilities – Sequential access devices 3/5/03 - 33 – possible to seek directly to any desired byte/block – seeks may or may not be instantaneous – examples: memory, magnetic disk, CD, graphics adaptor They are used very differently computer and I/O architecture random access devices: disks Disk drive geometry random access devices are much more interesting – usage, performance, and scheduling techniques program loading, file I/O, paging – disk performance drives timesharing performance – a mounted assembly of circular platters – read/write head per surface, all moving in unison track – ring of data readable by one head in one position cylinder disk I/O operations are subject to overhead – – higher overhead means fewer operations/second – careful scheduling can reduce overhead – clever scheduling can improve throughput and delay computer and I/O architecture spindle head assembly key time sharing services depend on disk I/O – 3/5/03 - 34 corresponding tracks on all platter sector – 3/5/03 - 35 logical records written within tracks disk address = <cylinder / head / sector > computer and I/O architecture 3/5/03 - 36 Disk Drive - Logical Disk Drive – Physical Sectors Spindle Track 10 heads platter/surface 0 1 5 platters 10 surfaces head positioning assembly 8 9 Cylinder Motor computer and I/O architecture 3/5/03 - 37 computer and I/O architecture Optimizing disk performance Disk Drive Performance heads 10 platters cylinders 17,000 tracks/inch sectors/track 400 bytes/sector RPM 7200 speed seek time 2-15ms (average 9ms) latency 0-8ms (average 4ms) best case worst case average don't start I/O until disk is on-cyl/near sector 5 18,000 512 200Mb/sec – I/O ties up the controller, locking out other operations – other drives seek while one drive is doing I/O minimize head motion time to read one 8,000 byte block seek rotate transfer total 400 s 0ms 0ms 400 s 23.4ms (58X) 15ms 8ms 400 s 13.4ms (33X) 9ms 4ms 400 s computer and I/O architecture 3/5/03 - 38 – do all possible reads in current cylinder before moving – make minimum number of trips in small increments encourage efficient data requests 3/5/03 - 39 – have lots of requests to choose from – encourage cylinder locality – encourage largest possible block sizes computer and I/O architecture 3/5/03 - 40 Head Travel under various algorithms read sections 6-6.3 76 First Come First Served 124 17 269 201 29 137 12 48 107 252 125 Tot=880 29 Shortest Seek First 17 12 124 137 68 172 108 For the next lecture (see Greek to English dictionary regarding figure 6-3) there will be a quiz on the reading 76 47 12 5 112 13 64 201 68 topics for the next lecture 269 Tot=321 76 Scan/look (elevator algorithm) 124 137 201 269 29 17 12 48 13 5 Tot=450 64 68 240 12 computer and I/O architecture 3/5/03 - 41 key points user view of processes – process address spaces – object modules, load modules, linkage editing – procedure calls, stack frames, system calls, signals computer and I/O architecture trap and interrupt handling channels sit between CPU and I/O devices – save/restore, vectoring 1st and 2nd level handlers think of them as extremely smart busses the include highly specialized CPUs busses, devices, controllers, interconnections – they execute channel I/O programs I/O mechanisms, what they are, how they work – instructions to read, write and control devices – instructions to generate progress interrupts – polled I/O, direct I/O, memory mapped I/O, DMA – interrupt driven I/O, smart controllers once started, I/O programs execute w/o CPU attention random access devices – disk geometry, disk performance, disk scheduling computer and I/O architecture 3/5/03 - 42 Channel Controllers – I/O co-processors supervisor mode execution, privileged instructions – – 3/5/03 - 43 – command chaining – data chaining computer and I/O architecture 3/5/03 - 44 Typical Channel Architecture Typical Channel Program (both programs located in main memory) Device Controller 0x11? CPU Main bus Channel Controller 0x1?? Channel Controller 0x2?? ... Device Controller 0x1F? Device 0x110 ... Device 0x11F SIO 0x101, iopgm ... ... ... 3/5/03 - 45 Channel Controller iopgm SEEK cyl=1020, hd=5, rec=10 READ buf=xxx, cnt=4096 READX buf=yyy, cnt=4096, intr TIC next next intr: TIO 0x101 all channels, controllers and devices have "Geographic" addresses computer and I/O architecture Main CPU ... computer and I/O architecture SEEK cyl=1050, hd=0, rec=2 WRITE buf=zzz, cnt=8192, intr END intr (note, channel can concurrently execute one program per controller) 3/5/03 - 46