Input/Output (I/O) Devices

Input/Output (I/O) Devices Processor Interrupts Cache Memory– I/O bus Main memory I/O controller Disk Disk I/O controller I/O controller Graphics output Network • I/O is the eyes, ears, nose, mouth, hands, legs, etc. of the computer system. Imagine a computer without I/O. 1 The Impact of I/O on Performance • Elapsed time = CPU time = I/O time, suppose CPU time improves by 50% per year and I/O time doesn't improve. # of years CPU time I/O time Elapsed time % I/O time 0 90 10 100 10% 1 60 10 70 14% 2 40 10 50 20% 3 27 10 37 27% 4 18 10 28 36% 5 12 10 22 45% • The improvement in CPU performance over 5 years is 90/12=7.5, however the improvement in elapsed time is only 100/22=4.5, and I/O time has increased from 10% to 45% of the elapsed time. This clearly illustrates (‫ )מדגים את‬Amdahl's Law. • If we care about system throughput (‫ )הספק‬what interests us is I/O bandwidth (‫)רוחב סרט‬, the amount of data transferred in a period of time. (A MB in I/O bandwidth is 1,000,000 bytes not 2202 Characteristics (‫ )מאפיינים‬of I/O Devices Device Behavior Keyboard input Mouse input Voice input input Scanner input Voice output output Ink-Jet printer output Laser printer output Screen output Modem input/output LAN input/output Floppy disk storage Magnetic tape storage Magnetic disk storage Partner human human human human human human human human machine machine machine machine machine Data rate (KB/sec) 0.01 0.02 0.02 400.00 0.60 50.00 200.00 60,000.00 3.00-128.00 500.00-6000.00 100.00 2000.00 2000.00-10,000.003 The Mouse • The interface between a trackball +20 in Y +20 in Y +20 in Y – 20 in X +20 in X mouse and the system is by counters that are incremented or decremented, depending on the Initial – 20 in X position +20 in X mouse's movement. of mouse The system must also detect when one of the mouse buttons is – 20 in Y – 20 in Y clicked, double clicked, held down, – 20 in Y – 20 in X +20 in X or released. Software reads the signals generated by the mouse and determines how much to move, how fast, and what each mouse button action means. • The system monitors the status of the mouse by polling the signals it generates. Polling is constantly reading signals every n cycles. This is the simplest form of interface between processor and I/O. 4 Magnetic Disks • The magnetic disk or hard disk is a nonvolatile (‫ )לא מחיק‬storage device. The disk is composed of rotating platters coated (‫)מצופים‬ with a magnetic surface. Moveable read/write heads access the data on the platters. • A magnetic disk is made out of a several (1-15) platters which are rotated (‫ )מסובבים‬at 3600-7200 RPM (Revolutions Per Minute). The diameter of each platter is between 2 to 20 centimeters. Both sides of each platter is used. Each platter is divided into circles (1000-5000)called tracks, each track is divided into sectors (64-200). A sector is the minimum unit of data read or written. It is usually 512 bytes. • A floppy disk is a magnetic disk with 1 platter only. Platters Tracks Platter Sectors Track 5 Accessing Data on the Disk • To access data the read/write heads must be moved to the right location. All heads move together so every head is over the same track on every surface. All the tracks under the heads at a given time are called a cylinder. Thus the number of cylinders is the number of tracks on 1 platter. • To access data is a three stage process(‫)תהליך תלת שלבי‬. The first step is to move the head over the right track. This is called a seek (‫ )חיפוש‬and the time do do this is called the seek time. Average seek times are between 5-15 ms. Due to locality the real average time can be 25% of the advertised time. • Once the head is over the correct track we must wait for the right sector to rotate (‫ )להסתובב‬under the head. This is called the rotational latency or rotational delay. The average latency is half way around the disk. Because the disk rotates at 3600 to 7200 RPM the average rotational latency is: 0.5/3600 = 0.000139 6 minutes = 0.0083 seconds = 8.3 ms (4.2ms for 7200RPM disks). Disk Read Time • Smaller diameter (‫ )קוטר‬disks are better because they can spin faster with less power, they are of course, more expensive. • The last stage of a disk access is the transfer time. The time to transfer a block of bits typically a sector. Transfer rates are between 3 to 20 MB/sec. Most disks today have built-in caches that store sectors as they are passed over. Transfer rates from the cache are up to 50MB/sec. • The control of the disk and the transfer between the disk and memory is handled by a disk controller, The overhead of using the controller is called controller time. • What is the average time to read or write a 512-bye sector, for a disk rotating at 7200RPM. The average seek time is 10ms, the transfer rate is 15MB/s, and the controller time is 1ms: 10ms + 4.2ms + (0.5KB/15MB)sec + 1ms = 10 + 4.2 + 0.03 +1 = 15.23ms. If the seek time is only 2.5ms we have 7.73ms. 7 Buses: Connecting I/O to Processor and Memory • I n a computer system the memory and processor have to communicate, as do I/O devices. This is done with a bus. A bus is a shared link which uses one set of wires to connect multiple devices. • The 2 major advantages of the bus are flexibility (‫ )גמישות‬and low cost. New devices can be added easily and only one set of wires is need to connect multiple devices. • The major disadvantage of the bus is that it is a bottleneck ( ‫צוואר‬ ‫)בקבוק‬, all devices compete (‫ )מתחרים‬for the bus and the bandwidth is relatively limited. • A bus transaction includes 2 parts: sending the address and receiving or sending data. A bus transaction is defined by what it does to memory. A read transaction transfers data from memory (to CPU or I/O device). A write transaction writes data to memory. 8 Input/Output Operations • The previous terms (‫ )הגדרות‬can be confusing. To avoid this we will use the terms input and output, an input operation in inputting data from an I/O device to memory where the CPU can see it. An output operation is outputting data to a device from memory where the CPU wrote it. • The shaded part shows what part of the device is used (right for read, left for write). • An output operation involves 3 steps: Control lines Memory Processor Data lines Disks a. Control lines Memory Processor Data lines Disks b. – Signal memory and I/O that a read transaction is to take place and send the memory address and I/O location. c. – Read the data from memory. – Write it to disk. Control lines Memory Processor Data lines Disks 9 Types of Buses Control lines • An input operation involves 2 steps: – Signal memory an I/O that a write transaction is to takes place and send the memory address and I/O location. – Write from I/O into memory. Memory Processor Data lines Disks a. Control lines Memory Processor Data lines b. Disks • Buses are classified into 3 types: processor-memory busses, I/O busses, or backplane busses. • Processor-memory buses are short, high-speed to maximize memory-processor bandwidth. • I/O busses are long and can have many types of devices connect to them.They don't interface directly with memory but use a busadapter to interface the processor-memory bus. • Backplane buses are designed to allow processor,memory, and I/O 10 to use the same bus. Backplane bus Standard Buses • Processor-memory buses are proprietary ( ‫מיוחדים למחשב‬ ‫ )מסוים‬while I/O and backplane buses are standard buses that are used by many computer makers. • In many PCs the backplane bus is the PCI bus and the I/O bus is a SCSI bus. Both are standard buses. Processor Memory a. I/O devices Processor-memory bus Processor Memory Bus adapter Bus adapter I/O bus Bus adapter I/O bus I/O bus b. Processor-memory bus Processor Memory Bus adapter Bus adapter I/O bus Bus adapter I/O bus Backplane bus c. 11 Synchronous and Asynchronous Buses • A bus is synchronous(‫ )סנכרני‬if it includes a clock in the control lines and has a fixed protocol for communication that is relative to the clock. For example, a processor-memory bus performing a read from memory transmits (‫ )משדר‬the address on the bus and after 5 clock cycles expects to have the data on the bus. • In synchronous buses the protocol is predefined so the bus can be very fast. There are 2 major disadvantages: Every device on the bus must run at the same clock rate and the bus must be short in order to have the clock run through it. • An asynchronous bus isn't clocked so it can connect many devices of varying clock speeds. To coordinate transactions on the bus a handshake protocol is used. Lets assume a device is requesting a word of data from memory. There are 3 control lines: – ReadReq: indicates a read request from memory. – DataRdy: indicates that the data word is now on the bus. – Ack: Used to acknowledge (‫ )לאשר‬the ReadReq or DataRdy signals. 12 The Handshake Protocol ReadReq 1 3 Data 2 2 4 6 4 Ack 5 7 DataRdy • The steps in the protocol begin after the I/O device has asserted the ReadReq signal and sends the address over the bus: 1. Mem sees the Readreq, reads the address and sets Ack. 2. I/O sees the Ack line is set and releases the ReadReq and data lines. 3. Mem sees that ReadReq is low and drops Ack to acknowledge that. 4. Mem places the data on the data lines and raises DataRdy. 5. I/O sees DataRdy, reads the data from the bus and raises Ack. 6. Mem sees the Ack signal, drops DataRdy and releases the data lines. 13 7. I/O sees DataRdy go low, drops Ack which indicates that transmission is over. Bus Performance • We want to compare the maximum bandwidth for a sync and async bus. The sync bus has a clock cycle time of 50ns and each bus transmission takes 1 cycle. The async bus requires 40ns per handshake. Each bus has 32 data lines. What is the bandwidth when performing one word reads from a 200ns memory. • Synchronous bus: 1. Send the address to memory: 50ns 2. Read the memory: 200ns. 3. Send the data to the device: 50ns Thus to read a 4 byte word takes 300ns, 4 bytes/300ns = 4MB/0.3sec = 13.3MB/sec. • Asynchronous bus: we can overlap several steps with the mem access: Step 1: 40ns Steps 2,3,4: max(3x40,200) = 200ns Steps 5,6,7: 3x40ns = 120 ns. Thus the transfer time is 360ns and the bandwidth is 11.MS/sec14 Increasing the Bus Bandwidth • Data bus Width:Transfers of multiple words require less cycles. • Separate vs. Multiplexed address and data lines: Previous examples used the same wires for address and data. Using separate wires will make writes faster as the address and data are sent in 1 cycle. • Block Transfers: Allowing the bus to transfer multiple words without sending an address or releasing the bus reduce the time needed to transfer large blocks. • Suppose we have a synchronous bus that has a data width of 64-bits and clock cycle of 50ns and memory with an access time for the first word of 200ns and 20ns for additional words, what is the read bandwidth: 1. Send address: 1 clock cycle 2. Read memory: 220ns/50ns = 5 clock cycles 3. Send data from memory: 1 clock cycle Thus to read 8 bytes takes 350ns. 8bytes/350ns = 8MB/0.35secs = 15 22.85MB/sec Obtaining Access to the Bus • How does a device gain access to the bus it wants to use? Without any control all devices will access the bus lines causing chaos (‫)בלגאן‬. The solution is to have one or more bus masters. The master initiates transactions and controls who can access the bus. • The memory is usually a slave, as it just responds to read/write Bus request lines requests. Memory Processor Bus • The simplest scheme is to Disks have only one master, the a. processor. A device generates Bus request lines Memory Processor a bus request which is sent Bus to the processor. The Disks processor then decides if to b. Bus request lines grant access or not and Memory Processor Bus controls the bus. The problem is that this ties up the CPU. Disks c. 16 Bus Arbitration (‫)בוררות‬ • A better scheme is to have multiple masters. Which master gets control of the bus is called bus arbitration. A device that wants to use the bus signals a bus request and waits until it is granted the bus. • Arbitration schemes have to balance two factors: bus priority, the highest-priority device gets the bus, and fairness all devices that want the bus will eventually get it. • There are several schemes but the Highest priority Lowest priority simplest one is called daisy chain Device 1 Device 2 Device n arbitration. If a higher-priority device sees that a lower-priority Grant Grant device has been granted Grant Bus Release the bus, it intercepts the arbiter signal and doesn't pass Request it on. There are several variants to the scheme that ensure some 17 kind of fairness. I/O Bus Characteristics Option High performance Low cost________ Bus width separate address and data lines wider is faster (32-bits) multiplex address and data lines narrower is cheaper (8-bits) single-word transfer is simpler single master (no arbitration) asynchronous Data width Transfer size multiple words require less bus overhead Bus masters multiple master (requires arbitration) Clocking synchronous 18 Interfacing I/O to Memory and Processor • In order for the processor to give commands to I/O devices it must be able to address the device and supply commands. Two methods are used to address I/O devices: • Memory-mapped I/O:parts of the address space are assigned to I/O devices. When the processor writes to these addresses the memory ignores them as they are mapped to I/O. The device controller that is mapped to the address reads the data and issues a command to the I/O device. The device writes to memory-mapped I/O to respond to the processor. • Special I/O instructions:Special instructions not available to users but only to the OS enable giving commands to I/O devices. • The simplest way for the processor to know if there is an I/O event is by checking in a loop. This is called polling. The device puts information in a status register or in memory-mapped I/O and the processor reads it. 19 Overhead of Polling • The disadvantage of polling is that it can waste the processors time. Lets see the overhead of polling in 3 devices, the overhead of a polling operation is 400 cycles and the processor has a 500MHz clock (500,000,000 cycles per second) • A mouse must be polled 30 times a second: So we need 30*400 = 12,000 cycles per second for polling. The overhead of mouse polling is 12,000/500,000,000 = 0.002% • A floppy disk transfers data to the processor in 16-bit units and has a data rate of 50KB/sec: (50KB/sec)/(2Bytes/polling) = 25K pollings/sec. Thus we need 25K*400 cycles per second for polling. (250,000*400)/500,000,000 = 2%. This is already significant. • A hard disk transfers data in 4-word units and can transfer 4MB/sec. (4MB/sec)/(16Bytes/polling) = 250K pollings/sec. Thus we need 100,000,000 cycles per second to poll the disk which is a 20 20% overhead. This is much to high. Interrupt-Driven I/O • The overhead in a polling interface lead to the invention of interrupts to notify the processor when an I/O device needs attention from the processor. The interrupt is handled the same way as the exceptions we have seen before. When the interrupt is detected, control is transferred to the OS who only then polls the device. The CauseRegister is used to tell the processor which device sent the interrupt. • Interrupts relieve the processor of having to wait for the devices, but the processor still has to transfer the data from memory to the device. • If a disk transfer of 4-words takes 500 cycles what's the processor overhead? Cycle per second for disk = 250K (same as polling) x 500 = 125,000,000 cycles thus the overhead is 25%. But if the disk is in use only 5% of the time the overhead is 1.25%. 21 Direct Memory Access • But what if the disk is being used all the time? A device call the DMA (Direct Memory Access) is used. The DMA transfers data directly from memory to I/O without involving the processor. • There are 3 stages in a DMA transfer: 1. The processor sets up the DMA by supplying it with the memory address and I/O device involved in the transfer. 2. The DMA performs the transfer. 3. The DMA notifies the processor (with an interrupt) that the transfer has completed. • What's the overhead of using DMA if the setup time is 1000 cycles and the completion interrupt takes 500 cycles? The average transfer from disk is 8KB, the disk is utilized 100% of the time. • Each DMA transfer takes 8KB/4MB/sec = 0.002 secs. So if the disk is constantly transferring there are 500 transfers per second which means there are 1500*500 processor cycles per 22 second . The overhead is 750,000/500,000,000 = 0.2% A Typical Desktop I/O System • Organization of the I/O on the Apple Macintosh 7200 series: Processor PCI interface/ memory controller Main memory Stereo input output Serial ports Apple desktop bus I/O controller I/O controller I/O controller PCI CDROM Disk Tape SCSI bus I/O controller I/O controller Graphics output Ethernet 23

Input/Output (I/O) Devices

Related documents

Products

Support

Input/Output (I/O) Devices

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib