Computer Organization and Architecture: Themes and Variations, 1st Edition CHAPTER 12 Computer Organization and Architecture 1 © 2014 Cengage Learning Engineering. All Rights Reserved. Clements Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Input/Output Input/Output is concerned with the mechanisms by which information is moved round a computer and between a computer and peripherals. 2 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.1 describes a generic system with a CPU, I/O controllers and peripherals, and a system bus that links the CPU to memory and peripherals. The word peripheral appears twice in Figure 12.1; it is used both to describe an external device such as a printer or a mouse connected to a computer, and it’s used to describe the controller that provides an appropriate interface between the external peripheral and the CPU 3 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The processor and memory lie at the heart of the system. The peripheral interfaces, connecting the processor and its memory to peripherals, are shown in two boxes; one includes internal peripherals, such as disk drives, and the other includes external peripherals, such as modems, printers, and scanners. 4 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Memory-mapped Peripherals There’s no fundamental difference between an I/O transaction and a memory access. Outputting a word to a peripheral is the same as storing a word in memory, and getting a word from a peripheral is exactly the same as reading a word from memory. Treating I/O transactions as memory accesses is called memory-mapped I/O. This doesn’t mean that we can forget about I/O because it’s just like accessing memory, since the properties of random access memory are radically different from the properties of typical I/O systems. 5 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Memory-mapped Peripherals When implementing I/O structures we have to take into account the characteristics of the I/O devices themselves; for example, when writing a file to a disk drive you might have to send a new byte of data every few microseconds. Figure 12.3 shows what a typical memory-mapped I/O port (peripheral interface chip) looks like to the processor. 6 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements To the host CPU this peripheral appears as the sequence of consecutive memory locations described by Figure 12.4. The left-hand side of the peripheral interface shaded gray in Figure 12.3 looks exactly like a memory element as far as the CPU is concerned. The other half of the peripheral interface chip, shown in blue, is the peripheral side that performs the specific operations required by the interface. 7 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The memory-mapped port of Figure 12.4 has four consecutive registers at addresses i, i + 1, i + 2, and i + 3. We have assumed that the peripheral is an 8-bit device and that its consecutive locations are each separated by one byte. In a system with a 32-bit data bus, the addresses of the registers would be i, i + 4, i + 8, and i + 12. The first location at address i contains a command register that defines the operating mode and characteristics of the peripheral. Most memory-mapped I/O ports can be configured to operate in several modes, according to the specific application. 8 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The location at address i + 1 contains the port’s status, which is set up by the associated peripheral. This status information can be read by the processor to determine whether the port is ready to take part in a data transaction or whether an error condition exists; for example, a printer connected to a memory-mapped I/O port might set an error bit to indicate that it is out of paper. In this example we’ve created generic status bits such as ERRout, ERRin, RDYout, RDYin. The locations at addresses at i + 2 and i + 3 are the addresses used to send data to the peripheral, or receive data from the peripheral. 9 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Peripheral Register Addressing Mechanisms The command and data-to-peripheral registers are write-only, and the status and data-from-peripheral registers are read-only. A single address line can distinguish between two pairs of registers (i.e., command/status, the data in/data out). The processor’s read and write signals distinguish between the read-only and write-only registers. Table 12.1 demonstrates this register-addressing scheme. The peripheral provides four internal registers, but the processor sees only two unique locations, N and N + 4. The CPU’s R/W* output is used to select one of two pairs of registers. When R/W* = 0, the write-only registers are selected and when R/W* = 1, the read-only registers are selected. Figure 12.5 emphasizes the way in which peripheral register space can be divided into read-only and write-only regions. 10 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Register address Function CPU address R/W i i+1 i+2 i+3 N N+4 N N+4 status data out control data in 1 1 0 0 11 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.6 illustrates a register file addressed by a counter. After the peripheral interface has been reset, the internal pointer is loaded with zero. Each successive access to the interface increments the pointer and selects the next register. Peripherals with auto-incrementing pointers are useful when the registers will always be accessed in sequence.. 12 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Peripheral Access and Bus Width Many peripherals have 8-bit wide buses and are interfaced to computers with 16 or 32 bits. Life is easy when 8-bit peripherals are connected to 8-bit data buses with 8bit processors, or when 16-bit peripherals are connected to 16-bit buses with 16-bit processors. Things get more complicated when 8-bit peripherals are interfaced to 16- or 32-bit buses. 13 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Two problems can arise when you interface an 8-bit peripheral to a 16-bit bus; endianism and the mapping of 8-bit registers onto a processor’s 16-bit address space. Consider the arrangement in Figure 12.7 where an 8-bit peripheral is interfaced to a 16-bit bus. The peripheral is connected to half the bus’s data lines. If the processor supports 8-bit bus transactions , all is well and the registers can be accessed at their byte addresses (at byte offsets 0, 1, 2, and 3). 14 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements If the processor supports only 16-bit bus operations, when a 16-bit value is written to memory all 16-bits are put on the data bus. When the processor performs a byte access, it still carries out a word access but informs the processor interface or memory that only 8-bits are to be transferred. A separate control or address signal is required to specify whether the byte being accessed is the upper or lower byte at the current address. 15 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In this case, the peripheral is hard-wired to one half of the data bus and can respond only to either odd or even byte addresses. In a big-endian environment, the peripheral would be wired to data lines [0:7] and accessed at the odd address, whereas in a little-endian environment the peripheral would be wired to data lines [0:7] and accessed at even addresses. The peripheral’s four addresses would appear to the computer at byte offsets of 0, 2, 4, and 6. 16 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Some processors have dedicated instructions to facilitate data transfer to byte-wide peripherals; for example, the 32-bit 68K has a MOVEP, move peripheral, instruction that copies 16- or 32-bit value to or from an 8-bit memory-mapped peripheral. Figure 12.8 shows a peripheral with four internal registers and a the CPU’s address map, where the peripheral’s data space is mapped onto successive odd addresses in this big-endian processor's memory space. 17 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.9 shows a peripheral with four 8-bit registers. The registers appear to the programmer as locations $08 0001, $08 0003, $08 0005, and $08 0007. Locations $08 0000, $08 0002, $08 0004, and $08 0006 cannot be accessed. MOVEP moves a 16/32-bit value between a register and a byte-wide peripheral. The contents of the register are moved to consecutive even (or odd) byte addresses; for example, MOVEP.L D2,(A0) copies the four bytes in register D2 to addresses: [A0] + 0, [A0] + 2, [A0] + 4, [A0] + 6, where A0 is a pointer register. 18 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.10 demonstrates how a MOVEP.L D0,(A0) copies four bytes in D0 to successive odd addresses in memory, starting at location 08 000116. The suffix .L in 68K code indicates 32-bit operation and .B indicates a byte operation. The most-significant byte in the data register is transferred to the lowest address. 19 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Without the MOVEP instruction, it would take the following code to move four bytes to a memory-mapped peripheral. MOVE.L #Peri,A0 MOVE.B D0,(6,A0) ROR.L #8,D0 MOVE.B D0,(4,A0) ROR.L #8,D0 MOVE.B D0,(2,A0) ROR.L #8,D0 MOVE.B D0,(0,A0) ROR.L #8,D0 ;A0 points to the memory-mapped peripheral ;Move least-significant byte of D0 to the peripheral ;Rotate D0 to get the next 8 bits ;Move the next byte, bits 8 to 15, to the peripheral ;and so on… After four rotations D0 is back to its old value 20 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Preserving Order in I/O Operations RISC architectures provide only memory load and store operations and don’t implement instructions that facilitate I/O operations. However, there are circumstances where RISC organization and memorymapped I/O clash. Some memory-mapped peripherals have configuration and self-resetting status registers or autoincrementing pointers. It’s important to access such peripherals in the appropriate programmerdefined sequence. Because superscalar RISC processors take an opportunistic approach to memory access, data can be stored in memory out-of-order. Such out-of-order memory accessing doesn’t cause problems with data storage and retrieval, but it can disrupt memory-mapped I/O. © 2014 Cengage Learning Engineering. All Rights Reserved. 21 Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The PowerPC implements an EIEIO (enforce in-order execution of I/O) instruction that has no parameters but ensures that all memory accesses previously initiated are completed. Consider this example where two loads are followed by an addition. lwz r5, 1000(r0) lwz r6, 1040(r0) add r7, r5, r6 ;load r5 from memory[1000] ;load r6 from memory[1040] ;r7 = r5 + r6 When these instructions are executed, the processor may swap the order in which r5 and r6 are loaded from memory. As long as the first two loads are executed before the add instruction, the outcome is not dependent on the order of the loads. Addresses 1000 and 1040 are memory-mapped locations. If the peripheral is designed so that a read access to address 1000 updates a register at 1040, the sequence of the two load instructions becomes all-important and reversing their order may lead to an incorrect result. 22 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Consider the following example where we have to update a peripheral. Because the register is accessed via a pointer, we write the register address to the peripheral’s pointer register before writing data to the register being pointed at. In this example, we want to load peripheral register number 35 with the value 99. The PowerPC code is: addi r5, r0, 35 addi r6, r0, 99 stw r5, 1234(r0) stw r6, 5678(r0) ;r5 = 35 ;r6 = 99 ;store 35 at memory location 1234 (the pointer) ;store 99 at memory location 5678 The two writes must be executed in the correct order. To ensure this, the PowerPC has three synchronization instructions, eieio, sync, and isync. The isync forces instructions or memory transactions to complete before continuing; that is, instructions prior to isync are executed and fetched instructions are discarded. 23 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Then, a new fetch begins. EIEIO forces all posted writes to complete prior to any subsequent writes. The SYNC instruction forces all previous reads and writes to complete on the bus before executing any instructions after it. We can ensure that the previous code runs in the correct order by inserting an EIEIO between the writes. addi r5,r0,35 addi r6,r0, 99 stw r5,1234(r0) eieio stw r6,5678(r0) ;r5 = 35 ;r6 = 99 ;M[1234] = 35; we're changing register 35 ;Make sure r5 is written before proceeding ;M[5678] = 99; new register value is 99 24 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Data Transfer Three concepts are vital to an understanding of data transfer: open- and closed-loop transfers, and data buffering. In an open-loop transfer, information is sent on its way and its correct reception is assumed. In a closed-loop transfer, the receiver actively acknowledges that the data has arrived. Data buffering is concerned with handling disparities between the rate at which data is transmitted and the rate at which it is consumed by the receiver. 25 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Open-loop Data Transfers The simplest method of transmitting data is to put the data on a bus and assert a signal, data strobe, to indicate that it is available. Figure 12.11 illustrates an open-loop transmission between a peripheral interface component and an external peripheral (e.g., a printer). The processor moves data to the peripheral interface with its address and data buses and the peripheral interface puts the data on the bus. 26 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The peripheral interface asserts a data available strobe, DAV*, to indicate to the peripheral that the data at its input terminal is valid. The peripheral reads the data and the peripheral interface negates its DAV* strobe to complete the transfer. 27 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.12 provides a timing diagram for this information exchange which is called is open loop because there is no feedback to acknowledge that the data has indeed been received. If the peripheral is off line, busy, or just very slow, the data may not be read during the time for which it is available (i.e., DAV* asserted). Open loop data transfers are also called synchronous transfers because the device receiving the data must be synchronized with the device sending the data. 28 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Closed-loop Data Transfers In a closed loop transfer the device receiving data returns an acknowledgment to the sender to close the loop. DAV* (data available) from the peripheral indicates the receipt of data. 29 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The peripheral interface makes the data available and asserts DAV* at B to indicate that the data is valid just as in an open-loop data transfer. The peripheral receiving the data sees DAV* asserted and reads the data. In turn the peripheral asserts ACK* to inform the interface that the data has been accepted. The interface de-asserts DAV* to complete the exchange. This sequence is known as handshaking. Handshaking supports slow peripherals, because the transfer waits until the peripheral indicates its readiness by asserting ACK*. 30 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The timing diagram in Figure 12.14 is called a handshake because the assertion of ACK* is a response to the assertion of DAV*. The advantage of a closed loop data transfer is that the originator of the data knows that it has been accepted and data cannot be lost because it was not read by the remote peripheral. 31 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The handshaking closed-loop protocol can be taken a step further. The assertion of DAV* is met by the assertion of ACK* from the peripheral. At this point it is assumed that the data has been received and the data exchange ends. Figure 12.15 shows a fully interlocked handshake in which the sequence of events is more tightly defined and each event triggers the next event in sequence. 32 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements At B in Figure 12.15 DAV* is asserted to indicate valid data and at C ACK* is asserted to indicate its receipt. The sequence continues with the negation of DAV* at point D. DAV* can be negated because the assertion of ACK* indicates that DAV* has been recognized. Negating DAV* indicates that its acknowledgement has been detected. The peripheral negates ACK* at E and removes the data at F after negating DAV*. Point F may come before point E because the removal of the data is a response to the negation of DAV* rather than to the negation of ACK*. 33 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Buffering Data When data is transmitted over a bus, you either have to use it while it is valid, or capture it in a memory device. Figure 12.16 illustrates three input circuits. Figure 12.16(a) uses the instantaneous values on data inputs I0 to I3; that is, the current data values are used and it is necessary for the transmitter to maintain the data values while they are being used. 34 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.16(b) illustrates single-buffered input using D flip-flops. When the data is to be read, the flip-flops are latched and the input captured. Single-buffered input captures data and holds it until the next time the latches are clocked. 35 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.16(c) provides a solution to the problem where new data arrives before the previous value has been read. Incoming data is latched exactly as before. Data in the input latches is copied to a second set of latches, where it is buffered for a second time. The input side of the buffer can be capturing data while the output side is waiting for the old data to be read. 36 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.17 gives the timing diagram of a double-buffered input system. The input arrives at fixed time intervals. Input samples are clocked into the input latches at regular intervals by clock CIi, where i is the clock pulse number. 37 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The FIFO A general solution to data buffering is provided by the first-in-first-out, FIFO, memory. Data is written into a FIFO queue one value at a time and read out in the same order. Once the data has been read it cannot be accessed again. A FIFO can be empty, partially filled, or full; they usually have output flags to indicate fully empty or partially full. 38 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The simplest FIFO structure is a register with an input port that receives the data and an output port. The data source provides the FIFO input and a strobe. Similarly, the reader provides a strobe when it wants data. Figure 12.18 describes a FIFO, FULL indicates that no more data can be accepted and EMPTY indicates that no more data can be read. When data arrives at the input terminals, it ripples down the shift register until it arrives at the next free location. 39 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.19 demonstrates a 10-stage FIFO as data is added and removed. 40 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The FIFO is usually built around a random access memory element, that is arranged as a circular buffer. A read pointer and a write pointer keep track of the data in RAM. 41 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.21 illustrates the structure of a dual-port RAM FIFO. The advantage of RAM-based FIFOs over register-based FIFOs is that the fall-through time of a RAM-based FIFO is constant and independent of its length. 42 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.22 demonstrates the use of a typical FIFO in a system with a 32bit computer using little-endian I/O and an 8-bit port using big-endian I/O. This FIFO is user-configurable and can be set up to perform bus matching; that is its input and output buses may have different widths. Its port A interface is 32 bits wide and its port B interface is 8 bits wide. You can program it to perform the byte swapping required when data is copied from a little endian to a big endian system. 43 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.23 gives the timing diagram for the case when two 32-bit words are read into the FIFO and eight 8-bit byes are read from it. 44 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements I/O Strategy A computer implements I/O transactions in one of three ways. • it can perform an individual I/O transaction at the point the operation is needed by programmed I/O • it can execute another task until a peripheral signals its readiness to take part in an I/O transaction by interrupt-driven I/O • it can ask special-purpose hardware to perform the I/O transaction by direct memory access, DMA, hardware. Computer systems may employ a mixture of these strategies. 45 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Programmed I/O A typical memory-mapped peripheral has a flag bit that is set by the peripheral when it is ready to take part in a data transfer. In programmed I/O the computer interrogates the peripheral’s status register and proceeds when the peripheral is ready. We can express this operation in pseudocode as: REPEAT Read peripheral status UNTIL ready Transfer data to/from peripheral 46 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Programmed I/O The operation “REPEAT Read peripheral status UNTIL ready” constitutes a polling loop, because the peripheral’s status is continually tested until it is ready to take part in the I/O transaction. In the following example, status bit RDY is set if the peripheral has data. If we take the I/O model of Figure 12.4 and translate the pseudocode into generic assembly language form to perform an input operation, we get ADR MOV STR Rpt1 LDR AND BEQ LDR r1,i0 r2,#Command [r1],r2 r3,[r1,#2] r3,r3,#1 Rpt1 r3,[r1,#4] ;Register r1 points to the peripheral ;Define peripheral operating mode ;Set up peripheral. Load the command ;Read input status word into r3 ;Mask status to RDYIN bit ;Repeat until device ready ;Read the data into r3. 47 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Interrupt-driven I/O A more efficient I/O strategy uses an interrupt handling mechanism to deal with I/O transactions when they occur. The processor carries out another task until a peripheral requests attention. When the peripheral is ready, it interrupts the processor, carries out the transaction and then returns the processor to its pre-interrupt state. 48 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The two peripheral interface components are each capable of requesting the processor’s attention. All peripherals have an active-low interrupt request output, IRQ*, that runs from peripheral to peripheral, and is connected to the processor’s IRQ* input. 49 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Active-low means that a low voltage indicates the interrupt request state. The reason that the electrically low state is used as the active state is entirely because of the behavior of transistors; that is, it is an engineering consideration that dates back to the era of the open-collector circuit that could only pull a line down to zero. 50 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Whenever a peripheral wants to take part in an I/O transaction, it asserts its IRQ* output and drives the IRQ* input to the CPU active low. The CPU detects that IRQ* has been asserted and responds to the interrupt request if it has not been masked. 51 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Most processors have an interrupt mask register that allows you to turn off interrupts if the CPU is performing an important operation. Interrupts may be masked when the processor is performing a critical task; for example, a system using real-time monitoring of fast events would not defer to a keyboard input interrupt (even a fast typist is glacially slow compared to a computer’s internal operation). Similarly, recovery from a system failure such as a loss of power will be given priority. 52 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The way in which a processor responds to an interrupt is device-dependent. The two peripherals in Figure 12.24 are wired to the common IRQ* line and the CPU can’t determine which device interrupted. The CPU identifies the interrupting device by polling each peripheral’s status register until the interrupter has been located. Interrupt polling provides interrupt prioritization because important devices whose interrupt requests must be answered rapidly are polled first. In Figure 12.24 each memory-mapped peripheral has an interrupt vector register, IVR, that tells the processor how to find the appropriate interrupt handler. Typically, the IVR supplies a pointer to a table of interrupt vectors. 53 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Interrupt Processing When an interrupt occurs, the computer first decides whether to service it or whether to ignore it. When the computer responds to the interrupt, it carries out the following sequence of actions. • It completes the current instruction. • The contents of the program counter are saved to allow the program to continue from the point at which it was interrupted. • The state of the processor must also be saved. A processor’s state is defined by the flag bits of the condition code, plus other status information. • A jump is then made to the location of the interrupt handling routine, which is executed like any other program. • After this routine has been executed, a return from interrupt is made, the program counter restored, and the system status word returned to its pre-interrupt value. 54 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.25 shows how a typical CISC responds to an interrupt request. Stack PSR indicates that the processor status register is pushed on the stack. The interrupt is transparent to the interrupted program and the processor is returned to the state it was in immediately before the interrupt took place. 55 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Nonmaskable Interrupts An interrupt request may be denied or deferred. Some microprocessors have a nonmaskable interrupt request, NMI, that can’t be deferred. A nonmaskable interrupt is reserved for events such as a loss of power. The NMI handler routine forces the processor to deal with the interrupt and to perform an orderly shutdown of the system, before the power drops below a critical level and the computer fails completely. 56 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Prioritized Interrupts Microprocessors often support prioritized interrupts (i.e., the chip has more than one interrupt request input). Each interrupt has a predefined priority and a new interrupt with a priority lower than or equal to the current one cannot interrupt the processor until the current interrupt has been dealt with. Equally, an interrupt with a higher priority can interrupt the current interrupt. 57 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Nested Interrupts Interrupts and other processor exceptions have all the characteristics of a subroutine, the return address is stacked at the beginning of the call and then restored once the subroutine has been executed to completion. The interrupt is a subroutine call with an automatic target address supplied in hardware or software and a mechanism that preserves the state of the condition code as well as the program counter. Just as subroutines can be nested, so can interrupts. 58 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.26 demonstrates nested interrupts. A level 1 interrupt occurs a second time. A level 2 interrupt takes place before the level 1 interrupt handler has completed its task. The level 1 interrupt handler is interrupted and the level 2 interrupt processed. Once the level 2 interrupt has been dealt with, a return is made to the level 1 interrupt handler. 59 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Example of a sequence of nested interrupts 60 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Vectored Interrupts When a processor with a single interrupt request line detects a request for service, it doesn’t know which device made the request and can’t begin to execute the appropriate interrupt handler until it has identified the source of the interrupt. A vectored interrupt solves the problem of identifying the source by forcing the requesting device to identify itself to the processor. Without vectored interrupts, the processor must examine each of the peripherals’ interrupt status bits. When the processor detects an interrupt request it broadcasts an interrupt acknowledge to all potential interrupters. Each possible interrupter detects the acknowledge from the CPU and the interrupting device returns a vector that is used by the CPU to invoke the appropriate interrupt handler. © 2014 Cengage Learning Engineering. All Rights Reserved. 61 Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.28 demonstrates the 68K prioritized, vectored interrupts. There are 7 levels of interrupt request. Level i is serviced in preference to level j, if i > j. The scheme permits nested interrupts. An interrupt at level i can be interrupted by a new interrupt at level j if j > i. 62 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Direct Memory Access The most sophisticated means of dealing with IO uses direct memory access, DMA, in which data is transferred between a peripheral and memory without the active intervention of a processor. In effect, a dedicated processor performs the I/O transaction by taking control of the system buses and using them to move data directly between a peripheral and the memory. DMA offers a very efficient means of data transfer because the DMA logic is dedicated to I/O processing and a large quantity of data can be transferred in a burst; for example, 128 bytes of input. 63 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.29 describes a system that uses DMA to transfer data to disks. A DMA controller, DMAC, controls access to the data bus. The DMA controller must first be loaded with the destination of the data in memory and the number of bytes to be transferred; that is, you have to program the DMA controller before it can be triggered. 64 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Three bus switches control access to the data bus by the CPU, memory, and DMA controller. A bus switch is turned on or off to enable or disable the information path between the bus and the device interfaced to the bus switch. Normally, the CPU bus switch is closed and the DMAC and peripheral bus switches are open. The CPU transfers data between memory and itself by putting an address on the address bus and reading or writing data. 65 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.30a illustrates the situation in which the CPU is controlling the buses and Figure 12.30b demonstrates how the DMA controller takes control of the data bus to perform the data transfer itself. 66 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Bus Bus is a contraction of the Latin omnibus that means for all. A behaves like a highway that is used by multiple devices. In a computer, all the devices that wish to communicate with each other use a bus. Figure 12.31 illustrates the organization of a computer with three buses. 67 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The system bus of is made up of the address, data and control paths from the CPU. Memory and memory-mapped I/O devices are connected to this bus. Such a bus has to be able to operate at the speed of the fastest device connected to it. The system bus demonstrates that a one size fits all approach does not apply to computer design because it would be hopelessly cost-ineffective to interface low-cost, low-speed peripherals connected to a high speed bus. 68 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In systems with more than one CPU (or at least more than one device that can initiate data transfer actions like a CPU) the bus has to decide which of the devices that want to access the bus should be granted access to it. This mechanism is called arbitration and is a key feature of modern system buses. A device that can take control of the system bus is called a bus master, and a device that can only respond to a transaction initiated by a remote bus master is called a bus slave. 69 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In Figure 12.31, the CPU is a bus master and the memory system a bus slave. One of the I/O ports has been labeled bus master because it can control the bus (e.g., for DMA data transfers), whereas the other peripheral is labeled bus slave because it can respond only to read or write accesses. The connection between the disk drive and its controller is also labeled bus because it represents a specialized and highly dedicated example of the bus. 70 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Bus Structures and Topologies A simple bus structure is illustrated by the CPU plus memory plus local bus in Figure 12.32. Only one device at a time can put data on the data bus. Data is transferred between CPU and memory or peripherals. The CPU is the permanent bus master and only the CPU can put data on the bus or invite memory/peripherals to supply data via the bus. 71 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.33 illustrates a bus structure that employs two buses linked by an expansion interface. Each of these separate bus systems may have entirely different levels of functionality; one might be optimized for high-speed processor-to-memory transactions, and the other to support a large range of plug-in peripherals. 72 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Bus Speed Suppose device A transmits data to device B. Let’s go through the sequence of events that take place when device A initiates the data transfer at t = 0. Initially, A drives data onto the data bus at time td, the delay between device A initiating the transfer and the data appearing on the bus. Data propagates along the bus at about 70% of the speed of light or about 1 ft/ns. 73 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements When the data reaches B, it must be captured. Latches are specified by their setup and hold times. The data setup time, ts, is the time for which the data must be available at the input to system B for it to be recognized. The data hold, th, time is the time for which the data must remain stable at system B’s input after it has been captured. 74 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The time taken for a data transfer, tT, is, therefore, tT = td + tp + ts + th. Inserting typical values for these parameters yields 4 + 1.5 + 2 + 0 = 7.5 ns, corresponding to a data transfer rate of 1/7.5 ns = 109/7.5 = 133.3 MHz. A 32-bit-wide bus can transfer data at a maximum rate of 533.2 MB/s. In practice, a data transfer requires time to initiate it, called the latency, tL. Taking latency into account gives a maximum data rate of 1/(tT + tL). Higher data rates can be achieved with pipelining, by transmitting the next data element before system B has completed reading the previous element. 75 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.35 demonstrates the application of pipelining to the previous example. Data must be stable at the input to system B for at least ts + th seconds; then a new element may replace the previous element. Pipelining allows an ultimate data rate of 1/(ts + th). 76 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Address Bus Some systems have an explicit address bus that operates in parallel with the data bus. When the processor writes data to memory, an address is transmitted to the memory system at the same time the data is transmitted. Some systems combine address and data buses together into a single multiplexed bus that carries both addresses and data (albeit alternately). 77 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.36 describes the multiplexed address/data bus which requires fewer signal paths and the connectors and sockets require fewer pins. Multiplexing addresses and data onto the same lines requires a multiplexer at one end of the transmission path and a demultiplexer at the other end. Multiplexed buses can be slower than non-multiplexed buses and are often used when cost is more important than speed. 78 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The efficiency of both non-multiplexed and multiplexed address buses can be improved by operating in a burst mode in which a sequence of data elements is transmitted to consecutive memory addresses. Burst-mode operation is used to support cache memory systems. Figure 12.37 illustrates the concept of burst mode addressing where an address is transmitted for location i and data for locations i, i+1, i+2, and i+3 are transmitted without a further address. 79 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Control Bus The control bus regulates the flow of information on the bus. Figure 12.38 describes a simple 2-line synchronous control bus that uses a data-direction signal and a data validation signal. The data direction signal is R/W* and is high to indicate a CPU read operation and low to indicate a write operation. 80 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Some systems have separate read and write strobes rather than a R/W* signal. Individual READ* and WRITE* signals indicate three states: an active read state, an active write state, and a bus free state (READ* and WRITE* both negated). A R/W* signal introduces ambiguity because when R/W* = 0 the bus is always executing a write operation, whereas when R/W* = 1 indicates a read operation or the bus is free. The active-low data valid signal, DAV*, is asserted by the bus master to indicate that a data transfer is taking place. 81 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Let’s look at an example of an asynchronous data transfer, a processor memory read cycle. Figure 12.39 provides the simplified read cycle timing diagram of a 68020 processor. The processor is controlled by a clock, CLK, and the minimum bus cycle takes six clock states labeled S0 to S5. 82 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Arbitrating for the Bus In a system with several bus masters connected to a bus, a mechanism is needed to deal with simultaneous bus requests. The process by which requests are recognized and priority given to one of them is called arbitration. There are two approaches to dealing with multiple requests for a bus— localized arbitration and distributed arbitration. In localized arbitration, an arbitration circuit receives requests from the contending bus masters and then decides which of them is to be given control of the bus. In a system with distributed arbitration, each of the masters takes part in the arbitration process and the system lacks a specific arbiter—each master monitors the other masters and decides whether to continue competing for the bus or whether to give up and wait until later. 83 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Localized Arbitration and the VMEbus The VMEbus supports several types of functional modules. We are interested in the bus master that controls the bus, the bus requester that requests the bus, and the arbiter that grants the bus to a would-be master. A bus requester is employed by a bus master when it wants to access the VMEbus. A VMEbus is usually housed in a box with a number of slots into which modules can be plugged (rather like the slots used by the PCI bus). 84 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The VMEbus’s arbitration bus is described in Figure 12.40. A bus requester uses BR0* to BR3* (bus request 0 to bus request 3) to indicate that the bus master wants the bus. Four bus grant lines are used by the arbiter to grant control of the bus to the requester. Bus clear (BCLR*) and bus busy (BBSY*) control the arbitration process. 85 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The VMEbus arbiter is located in a special position on a VMEbus, slot 1. All bus request lines run the length of the VMEbus and any would-be master can place a request on one of these lines. The level of the request is userdetermined; that is, the user decides which of the four bus request lines are to be connected to a module’s request output. 86 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The arbiter reads the bus request inputs from all the slots along the bus, decides which request is to be serviced, and then informs other modules of its decision via its bus grant outputs. 87 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The VMEbus supports four levels of arbitration. We will soon see that each of these four levels can be further subdivided. The bus request lines run the length of the VMEbus and terminate at the arbiter in slot 1 88 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements When one or more bus requesters wish to access the VMEbus, they assert the bus request lines to which they have been assigned; for example the card in slot three might assert bus request line BR1* and the card in slot 5 might assert bus request line BR3*. The arbiter in slot decides whether one of them is to succeed. 89 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements If a request on, say, BR2* is successful, the arbiter sends a bus grant message on its level 2 bus grant output, BG2OUT*. We will write BGxIN*, BGxOUT* and BRx* where x is 0 to 3 to avoid referring to specific levels. 90 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The BGx* lines do not run along the entire length of the bus. Instead the VMEbus employs a chain of lines called bus grant out and bus grant in. The BGxIN * and the BGxOUT run from slot-to-slot rather than from end-to-end. A BGxOUT * line from a left-hand module is passed out on its right as a BGxIN * line. Therefore, the BGxOUT * of one module is connected to the BGxIN * of its right-hand neighbor. The arrangement is called daisy-chaining. A continuous bus line transmits a signal in both directions to all devices connected to it. The daisy-chained line is unidirectional, transmitting a signal from one specific end to the other. Each module connected to (i.e., receiving from and transmitting to) a daisy-chained line may either pass a signal on down the line or inject a signal of its own onto the line. 91 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In Figure 12.42 the requester in slot j requests the bus at level 1 when no other device is requesting the bus. When BR1* is asserted, the arbiter detects it and asserts BG1OUT*, which passes down the bus until it reaches slot j. The arbiter in slot 1 sends a bus grant input to the card in slot 2. The card in slot 2 takes this bus grant input and passes it on as a bus grant output to the card in slot 3, and so on. . 92 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Each card receives a bus grant input from its left hand neighbor and may or may not pass it on as a bus grant output to its right hand neighbor. A card might choose to terminate the daisy chain signal-passing sequence and not transmit a bus grant signal to its right hand neighbor. If a slot is empty, bus jumpers (i.e., links) must be provided to route the appropriate BGxIN* signals to the corresponding BGxOUT* terminals. 93 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements A requester module makes a bid for control of the system data transfer bus by asserting one of the bus request lines, BR0* to BR3*. Only one line is asserted and the actual line is chosen by assigning a given priority to the requester. This priority may be assigned by on-board user-selectable jumpers or dynamically by software. 94 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The arbiter in slot 1 asserts a BGxOUT* line, and a bus grant propagates down the daisy-chain. Each BGxOUT* arrives at the BGxIN* of the next module. If that module doesn’t want the bus, it passes on the request on its BGxOUT*. If the module requested the bus, it takes control of the bus. Daisy chaining provides automatic prioritization, because bus requesters nearer the arbiter win the arbitration—this is called geographic prioritization. 95 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.43 provides a protocol flowchart for VMEbus arbitration. Initially, a bus master in slot M at a priority less than i is in control of the bus. This current bus master asserts the bus busy signal, BBSY*, that runs the length of the bus. As long as any master is asserting BBSY* no other master may attempt to gain control of the VMEbus. An active bus master in a VMEbus cannot be forced off the bus. 96 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements 97 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Suppose a bus requester in slot N requests the bus at a priority higher level than the current master. The arbiter detects the new higher level and asserts its bus clear output which informs the current master that another higher priority device wishes to access the bus, 98 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The current master does not have to relinquish the bus within a prescribed time limit. Typically, it will release the bus at the first convenient instant by negating BBSY*. The VMEbus provides both geographic prioritization determined by a slot’s location and an optional prioritization by bus request. 99 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements BCLR* is driven only by arbiters that permanently assign fixed priorities to the bus request lines. Other arbitration mechanisms, such as the round robin arbitration scheme to be described later, have no fixed priority and the arbiter does not make use of the bus clear line. When the arbiter detects that the current master has released the bus, the arbiter asserts BGiOUT* to indicate to the requester at level i that it has gained control of the bus. The arbiter knows only the level of the request and not which slot it came from. The bus grant message ripples along the bus, entering each module as BGiIN* and leaving as BGiOUT*. When this message reaches the requester in slot N that made the request at level i, the message is not passed on. Instead, the requester asserts BBSY* to show that it now has control of the bus. What would have happened if a requester also at level i but located nearer to the arbiter than slot N had also requested the bus at approximately the same time? The answer is that the requester closer to the arbiter would have 100 received the bus grant first and have taken control of the bus. . © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Releasing the Bus The requester may implement one of two options for releasing the bus; option RWD, release when done, and option ROR, release on request. Option RWD requires the requester to release the bus as soon as the on-board master stops indicating bus busy; that is, the master remains in control of the bus until its task has been completed, which can lead to undue bus hogging. The ROR option is more suitable in systems in which it is unreasonable to grant unlimited bus access to a master. The ROR requester monitors the four bus request lines. If it sees that another requester has requested service, it releases its BBSY* output and defers to the other request. The ROR option also reduces the number of arbitrations requested by a master, as the bus is frequently cleared voluntarily. . 101 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Arbitration Process Figure 12.44 demonstrates what happens when two requesters at different levels of priority request the bus. Both requesters A and B assert their bus request outputs simultaneously. Assuming that the arbiter detects BR1* and BR2* low, the arbiter asserts BG2IN* on slot 1, because BR2* has a higher priority. When the bus grant has propagated down the daisy-chain to requester B, requester B will respond to BG2IN* by asserting BBSY*. Requester B then releases BR2* and informs its own master that the VMEbus is now available. 102 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements 103 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements VMEbus Arbitration Algorithms Three strategies that the arbiter in slot 1 can be used prioritizate the bus request. 1. Option RRS (round robin select) The RRS option assigns priority to the masters on a rotating basis. Each of the four levels of bus request has a turn at being the highest level. 2. Option PRI (prioritized) The PRI option assigns a level of priority to each of the bus request lines from BR3* (highest) to BR0*. 3. Single level (SGL) The SGL option provides a minimal arbitration facility using bus request line BR3* only. The priority of individual modules is determined by daisy-chaining, so that the module next to the arbiter module in Slot 1 of the VMEbus rack has the highest priority. As the position of a module moves further away from the arbiter, its priority reduces. 104 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Distributed Arbitration Not all buses use a centralized arbiter to decide which of the competing bus masters is to get control of the bus. A mechanism called distributed arbitration allows arbitration to take place simultaneously at all slots along the bus. We now describe a backplane bus that supports distributed arbitration, the NuBus, a general-purpose synchronous backplane bus with multiplexed address and data lines that is also known as ANSI/IEEE STD 1186-1988. It was conceived at MIT in 1970 and later supported by Western Digital and Texas Instruments (1983). Apple implemented a subset of NuBus in their Macintosh II. 105 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The key to NuBus arbitration is each module’s unique slot number that ranges from 016 to F16. When a card in a slot arbitrates for the bus, the card places its slot number on the bus and, as if by magic, any other requester with a lower slot number strops arbitrating for the bus. Equally, if a slot with a higher number wants the bus, the requesting slot stops requesting the bus; that is, if a card arbitrates for the bus and then finds that a card with a higher priority is also arbitrating for the bus, it backs off. 106 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements To appreciate how distributed arbitration works, you have to understand the open-collector gate. Historically, the open-collector gate precedes the tristate gate and is used to allow more than one device to drive the same bus. Figure 12.45 illustrates an inverter with an open-collector output. The gate’s output can be actively forced to a low voltage. When the input of the gate is 1, the internal transistor switch is closed and its output is forced low just like a normal inverter. 107 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements When the input is 0, the transistor switch is open and the output of the open-collector gate is left floating because it is internally disconnected from the high- or low-level power rails. That is, the open-collector gate has an active-low output state and a floating state and can pull a bus down into a low state, but it can’t pull the bus up into a high state. 108 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.46 illustrates the key circuit used in a distributed arbiter that has an input X and an output Y. The circuit is also connected to one of the arbitration control lines on the bus. In what follows, we are interested in the relationship between the circuit and the state of the bus. If you use Boolean algebra, you will see that output Y is 0 for any value of input X. This is not the whole story… 109 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Suppose the X input is 0 and that the level on the bus is low because another device is driving it low. In this case, the output of the open-collector inverter will also be forced low by the bus. Now, both inputs to the AND gate will be 0 and the Y output will be 1. That is, the Y output is 0 unless the input X is 1 and the bus is being driven low. We have a mechanism that can actively drive the bus low or detect when another device is driving the bus low when we are attempting to drive it high. This mechanism forms the basis of distributed arbitration. 110 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.47 shows how the distributed arbiter operates by considering all possible input conditions together with the state of bus line. Remember that the bus can be floating (not driven) or actively pulled down to a low level. When it is floating, a resistor weakly pulls the bus up to a high level. 111 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figures 12.47(a) and (b) assume that the bus is floating. The output of the circuit is always 0 and is independent of its input. In a real system, the bus will always be actively pulled down to an electrically low level or weakly pulled up to an electrically high level by a resistor. . 112 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In figures 12.47(c) and (d) the bus is being actively driven to 0. In Figure 12.47(c) the bus is actively being driven low, but the state of the opencollector is also low, so there is no conflict between the output of the opencollector inverter and the bus. The output of the circuit is 0. . 113 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In Figure 12.47(d) the input is 0 and the output of the open-collector gate is floating. The is low and the output of the inverter is pulled down to an electrically low state. The output of the circuit is 1. The output tells the system that another device is driving the bus low in contradiction to the input. 114 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In figures 12.47(e) and f the bus is in a high state because no other device is driving it low. Figure 12.47(e) is the interesting case. Here the input is 1 and the output of the open-collector gate is electrically low. This drives the bus to a low state. In this case the circuit is driving the bus. The output of the circuit is 0. In Figure 12.47(e) the input is 0 and the output of the inverter is floating so there is no conflict with the state of the bus. . 115 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements As you can see, there are two special cases. In one, the bus is active-low and the output of the inverter is high, which results in the inverter’s output being pulled down. In the other case, the bus is high and the output of the inverter is activelow, which results in the bus being forced low. Table 12.4 summarizes the action of this circuit. The input to this circuit represents the condition I want the bus or I don’t want the bus. If the bus is not being driven low, this circuit will drive the bus low itself if its input is 1. This circuit produces a 0 output unless its input is 1 and the bus is being actively driven low by some other device. Situation Bus condition I want the bus Bus free (high level) I want the bus Bus busy (low level) I do not want Don’t care the bus Result Output is 1. Get the bus and drive it low Output is 0. I do not get the bus Output is 0 © 2014 Cengage Learning Engineering. All Rights Reserved. 116 Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.48 illustrates the details of part of a NuBus. A potential master that wants to use the bus places its arbitration level on the 4-bit arbitration bus, ID3* to ID0*. Since NuBus uses negative logic, the arbitration number is inverted so that the highest level of priority is 0000 and the least is 1111. NuBus arbitration is simple. If a competing master sees a higher level on the bus than its own level, it ceases to compete for the bus. Each requester simultaneously drives the arbitration bus and observes the signal on the bus. If it detects the presence of a requester at a higher level, it backs off. 117 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements ID0* to ID3* define the slot location and priority level of the master, and lines ARB0* to ARB3* are the arbitration lines running the length of the bus. Arbitrate * permits the master to arbitrate for the bus, and the output GRANT is asserted if the master wins the arbitration. 118 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Suppose three masters numbered 0100 4, 5, and 6 put the codes 1011, 1010 and 1101, respectively, onto the arbitration bus. As the arbitration lines are open-collector, any output at a 0 level will pull the bus down to 0 Here, the bus will be forced to 1000. The master at level 2 putting1101 on the bus will detect that ARB2 is being pulled down and leave the arbitrating process. The arbitration bus will now be 1010. The master with the code 1011 will detect that ARB1 is being pulled down and will leave the arbitration process. The value on the arbitration bus is now 1010 and the master with that value has gained control. 119 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements PCI Bus The Peripheral Component Interconnect Local Bus (or just PCI bus) represents a radical change to the PC’s systems architecture. Intel designed this bus for use in Pentium-based systems towards the end of 1993. The PCI bus is not only much faster than previous buses; it greatly extends the functionality of the PC architecture. Indeed, the PCI bus is central to the PC's expandability and flexibility. The PCI bus allows users to plug cards into the computer system to increase functionality by adding modems, SCSI interfaces, video processors, sound cards, and so on. The PCI bus lets these cards communicate with the CPU via an interface known as a North Bridge. Bus interface circuits have come to be known collectively as a chipset. All PCs with PCI buses require such a chipset. 120 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The PCI is called a local bus to contrast it with the address, data, and control signals from the CPU itself. Connecting systems directly to the CPU provides the fastest data transfer rates and a bus connected directly to a CPU is called a front side bus. The PCI bus supports plug and play capabilities in which PCI plug-in cards are automatically configured at power up and resources such as interrupt requests are assigned to plug and play cards transparently to the user. The original PCI bus operated at 33 MHz and supported a 32-bit and 64bit data bus. PCI bus Version 2.1 supports a 66 MHz clock. The PCI bus is connected to the PC system by means of a single-chip PCI Bridge and to other buses via a second bridge. This arrangement means that a PC with a PCI bus can still support the older ISA bus. As time passes, fewer and fewer new PCs will have ISA buses because new users will demand PCI cards as they are better than ISA cards. 121 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.49 illustrates the relationship between the PCI bus, the bridge, processor, memory and peripherals. The processor is directly connected to a bridge circuit that allows the processor to access peripherals via the PCI bus. The PCI system consists of the PCI local bus itself, any cards plugged into the bus, and central resources that control the PCI bus. These central resources perform, for example, arbitration between the cards plugged into the bus. 122 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.50 shows a system diagram of a PC with a PCI local bus and an ISA bus. A second bridge, commonly called the South Bridge, links the PCI and ISA buses. 123 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.51(a) illustrates the relationship between the Pentium 4, its Intel chipset, and the PCI bus. Figure 12.51(b) illustrates the more modern Intel Core i7 Processor interface. 124 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements PCI bus arbitration Figure 12.52 demonstrates PCI bus arbitration. The REQ and GNT signals are connected to an arbiter that forms part of the north bridge. This arbiter reads the requests on REQ0 to REQ3 and returns a grant message on the GNT0 to GNT3 line corresponding to the arbitration winner. When a PCI agent arbitrates for the bus, the arbiter asserts the BPRI signal to inform the host processor that a PCI agent (i.e., a priority agent) requires the host bus. 125 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Data Transactions on the PCI Bus The PCI bus compensates for the address/data bus bottleneck in several ways. First, it can operate in a burst mode, in which a single address is transmitted and then the address/data bus is used to transmit a sequence of consecutive data values. Second, the PCI bus supports split transactions; that is, one device can use the bus and another device can access the PCI bus before the first transaction has been completed. Split transactions mean that the bus is used more efficiently. Finally, devices connected to the PCI bus can be buffered which allows data to be transmitted before it is needed. PCI bus literature has its own terminology (some of which is shared by SCSI systems). A device that acts as a bus master is called an initiator and a device that responds to a bus master is called a target 126 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Some of the key signals of the PCI bus are given below. Signal AD31 – AD0 C/BE3* – C/BE0* TRDY* IRDY* FRAME* DEVSEL* Function Multiplexed address and data Command/byte enable Target ready Initiator ready Frame Device select Driven by Initiator Initiator Target Initiator Initiator Target 127 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.53 illustrates a PCI read cycle in which an initiator reads data from a target on the PCI bus. 128 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.54 illustrates a PCI read cycle in which the address phase is followed by three data phases. 129 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The PCI Express Bus The PCI Express bus was designed to replace the PCI bus. Its goals were to cost less than the existing PCI bus, use off-the-shelf technology (boards, connectors, and circuits), support mobile, desktop and server markets, and be compatible with existing PCI-based systems The PCI express uses serial transmission to transfer data from point to point. 130 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.55 demonstrates the difference between the PCI bus and PCI Express protocols. The PCI bus protocol has echoes of the ISO standard for the Open Systems Interconnection (OSI) model, that attempts to divide any communications system into seven abstract layers, where each layer performs a certain function for the layer above it. 131 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The lowest level of the PCI Express protocol is the physical layer responsible for transferring the bits from point-to-point. The PCI Express uses a serial bus where data is transmitted bit-by-bit along a single line or along a pair of lines using differential encoding. Two serial data paths are provided, one for each data direction; that is, a PCI Express card can both read and write data to the bus simultaneously and support full-duplex operation. The two signal paths are collectively called a lane and it is possible to implement multiple lanes. Performance scales linearly with lane numbers and you can have a x1 bus, a x2, bus, a x4 bus, a x8 bus, and so on. A single lane supports a peak data rate of 250 MB/s in each direction. A 16lane system using duplex transmission has an effective data rate of 8 GB/s. 132 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.56 illustrates the concept of lanes. 133 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Conventionally, information at the electrical level in digital systems is specified with respect to the ground or chassis; that is, a signal at greater than 3.0V is interpreted as high, and a signal at less than 0.3 V is interpreted as low. PCI Express uses two signal paths to transmit data and the difference between the two conductors contains the information; for example, the signals may be +V,-V or -V, +V. The advantage of differential transmission is that it is more immune to interference (noise and other signals induced by capacitive or inductive coupling). This form of signaling is called LVDS – low voltage differential signaling. If both conductors of a pair pick up interference it does not affect the information, which is determined by the difference between the two conductors. 134 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The encoding of the bit stream across the serial link ensures that a clock signal is embedded in the data stream and the data stream can be used to recover a clock signal This means that designers do not have to worry about the distribution of clock signals and delays between data and clocks caused by different path lengths in the signals (an important factor when signaling at 2.5 x 1030 bits/s). The bit encoding is called 8b/10b because each 8-bit byte is transmitted at 10 bits in order to equalize the number of 1s and 0s transmitted and to ensure that a clock signal can be recovered from the data signal. 135 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements 8b/10b Encoding 8b/10b encoding is a means of transmitting serial data using 10 bits to carry 8 bits of information. The additional two bits per byte improves the performance of the transmission mechanism. The ten-bit code is constrained to contain five 1s and five 0s, or four 1s and six 0s, or six 1s and four 0s. This ensures that there are no long series of only 1s and 0s. A mechanism called running disparity is used to ensure that there is an equal number of 1s and 0s on average; this is necessary to ensure that there is no dc component in the signal. 136 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements PCIe Data Link Layer Data transmitted across systems that support layered protocols looks a bit like a Russian doll with multiple layers of encapsulations. At one end of a link, the application takes a dollop of data and wraps it up with some form of ends or delimiters. Then, the application layer hands the package to another layer (e.g., the data link layer) and that layer in turn wraps up the data with its own terminators. The data link layer passes the data to the physical layer and that too adds beginning and end flags. 137 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.57 illustrates the concept of encapsulation using a system with three protocol levels or layers. Each protocol layer adds a header and a tail to the information passed from the layer below. Each layer strips the header and tail off before passing the message to the next level. 138 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.58 illustrates the PCIe bus message structure where the elements of a message are shown in blue and the protocol layers in grey. The highest level is the transaction layer that consists of a header and the actual message itself. The header defines the nature of the data message and includes information such as the address of the data – we will look at the header in more detail later. The transaction layer’s tail is an error-detecting code, ECRC 139 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.59 gives the general structure of a packet header that consists of 12 or 16 bytes. This structure means that all the hardware overhead associated with conventional buses becomes redundant (arbitration, interrupt, handshaking etc.) at the price of increased latency and reduced efficiency due to the data overhead. 140 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The SCSI and SAS Interfaces One of the earliest external buses designed to link a computer and peripherals is the SCSI bus. At one time it was the preferred bus in professional and high-end systems. Today, it is in decline in the face of very low-cost high-performance buses such as USB and FireWire. The Small Computer System Interface, SCSI, is an 8-bit parallel bus dating back to 1979, when the disk manufacturer Shugart was looking for a universal interface for its family of hard disks. The SCSI bus is a parallel data bus that incorporates an information exchange protocol optimized for the bus’s intended use, the linking of disk drives and other storage systems to a host computer 141 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.61 illustrates the concept of the SCSI bus which was originally called the SASI bus (Shugart Associates Systems Interface). In 1981 Shugart and NCR worked with ANSI to standardize the SCSI bus which became X3.131-1986 in 1986. 142 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The original SCSI-1 bus operated at 5 MHz permitting up to seven peripherals to be connected together. A family of SCSI buses with a common architecture and different levels of performance has been developed. The specification was revised in 1991 providing a fast SCSI-2 bus at 10 MHz and a wide bus with a 16- data path. Ultra SCSI or SCSI 3 was the next step with a clock rate of 20 MHz. All SCSI systems support asynchronous data transfers, but SCSI 2 also supports faster synchronous data transfers. USB 3.0 bus provides a theoretical limit of 4.8 Gbps or 600 MB/s. 143 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The original SCSI-1 bus operated at 5 MHz permitting up to seven peripherals to be connected together. A family of SCSI buses with a common architecture and different levels of performance has been developed. The specification was revised in 1991 providing a fast SCSI-2 bus at 10 MHz and a wide bus with a 16- data path. Ultra SCSI or SCSI 3 was the next step with a clock rate of 20 MHz. All SCSI systems support asynchronous data transfers, but SCSI 2 also supports faster synchronous data transfers. USB 3.0 bus provides a theoretical limit of 4.8 Gbps or 600 MB/s. 144 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Version Width Data rate MHz SCSI-1 Fast SCSI Fast Wide SCSI Ultra SCSI Wide Ultra SCSI Ultra-2 SCSI Wide Ultra-2 SCSI 8 8 16 8 16 8 16 5 10 10 20 20 40 40 Throughput MB/s 5 10 20 20 40 40 80 Ultra-3 SCSI Ultra 320 SCSI Ultra 640 16 16 16 80 160 320 160 320 640 145 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.62 provides a simplified state diagram for the SCSI bus and Table 12.9 describes the bus states. 146 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements State Bus free Description No devices are controlling the bus or transferring data. This state is indicated by the negation of the SEL and BSY lines. Arbitration When a device wishes to take control of the bus and become an initiator, it enters the arbitration state by asserting BSY and putting its ID on the data bus. If no device with a higher priority claims the bus, it goes ahead and claims the bus by asserting SEL. Selection In the selection state the initiator selects a target device and issues commands asking it carry out a specific operation. Selection is done by issuing the logical OR of the target device and the initiator on the bus. Reselection Because a target is able to give up the bus during a long operation, a reselection state is needed during which the target reclaims the bus. Command The command phase is used by the target device to request a command for the initiator. The target sets the C/D signal low to indicate a command and sets I/O high to indicate an output operation. Data In the data phase, data is transmitted between the initiator and target. Message Status The interface is controlled by messages sent between the initiator and target. The target returns a code to the initiator, indicating an operation’s status. 147 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Serial Attached SCSI (SAS) Serial attached SCSI, SAS, retains the best of SCSI while moving closer to USB and PCIe. Serial attached SCSI throws away SCSI’s antiquated physical layer and replaces it with a low-cost, high-performance serial interface. The topology of SAS is point-to-point unlike SCSI which is a multipoint bus. The physical layer of SAS uses differential signaling and cables up to 10 m are supported. SAS defines two low-level layers, physical and PHY that divide the traditional physical-layer-level functions (plus some traditional link-layer level functions) into two layers. The physical layer level is concerned only with connectors and voltage levels, where the PHY layer is concerned with data encoding, link initialization, speed negotiation, and reset sequences. SAS uses 8b/10b encoding. Cables and connectors are physically compatible with the SATA interface now used by all modern hard disk drives. Consequently, the same low-cost connectors can be used in both conventional PCs and SAS-based systems. SAS supports SATA Tunneling Protocol (STP) that enables conventional hard drives to be connected to SAS architectures. 148 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Serial Interface Buses Once upon a time, a serial bus swapped speed for simplicity. Parallel buses require multiple data paths and correspondingly complex plug-socket arrangements and cables whereas a serial bus carries data a bit at a time using two signal paths, one for the data and a ground return. A serial data link using a fiber optic cable requires a single data path. In the 1970s when RS232C serial connections were slow, you had to use a parallel data bus if you wanted speed. 149 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Early PCs had a parallel port with eight data lines using a DB-25 connector that you could use to interface to printers using 25-way ribbon cable. Some printers still have such basic parallel interfaces, although they are becoming increasingly obsolete. Modern serial data links are simple, fast, and effective. The great advantage of the serial data bus is its ease of use. The buses we describe next have had a tremendous impact on computing because they provide low-cost, high-performance solutions to linking computers with peripherals and even with other computers. We begin with an introduction to the serial bus that was to have a profound effect on computer communications, the Ethernet. 150 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Ethernet We introduce serial buses by briefly describing the Ethernet, developed to support local area networks at 10 Mbits/s. The Ethernet dates back to 1978 and now has the IEEE standard number 802.3. Today, it is the standard for low-cost local area networks operating at 100 Mbits/s or 1 Gbits/s. In an Ethernet, all devices are connected to a single cable and no special control lines are required. A device, or node, can transmit serial data onto the common bus that is connected to all other devices. The Ethernet transmission cable is now available in four versions: a thick coaxial cable, a thin coaxial cable, a very low-cost twisted pair of unshielded conductors and fiber optics 151 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The table below defines the nomenclature for these connections. The 10 refers to the speed of the link, the Base refers to baseband transmission (as opposed to modulated carrier systems), and the 5/2/T/F refer to the media type. The nomenclature 1000BaseF refers to Ethernet media operating at 1 Gbps using fiber optics. Name Media Type Max. Segment Length 10Base5 10Base2 10BaseT 10BaseF Thick coaxial RG58 (thin coaxial) UTP (twisted pair) Fiber optic 500 meters 185 meters 100 meters 2,000 meters Max nodes/segment 100 30 1024 1024 152 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The data is transmitted in the form of packets or frames. An Ethernet packet consists of seven fields starting with an 8-byte (64-bit) preamble that synchronizes the clock at the receiver with the transmitted bit stream. The first seven bytes have the bit pattern 10101010. The last byte of the preamble is the start of frame delimiter with the special pattern 10101011 that indicates the start of a frame. The 48-bit destination and source address fields indicate where the packet is from and where it is going. The length field defines the size of the packet, which is between 46 and 1500 bytes long. However, since the minimum data field must contain at least 46 bytes, data fields shorter than 46 bytes are padded to bring the size up to 46 bytes. Finally, a 32-bit frame check sequence provides a powerful error-detecting code. 153 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Ethernet is simply a message exchange mechanism (i.e., there are no control signals). One node on the Ethernet posts a frame to another node. How the data field in a package is interpreted is not part of the Ethernet specification. The physical layer of the Ethernet uses a baseband cable with phase encoded data transmitted at 100 Mb/s or 1 Gb/s second. No two nodes can access the Ethernet simultaneously without their messages interfering destructively with each other. When two messages overlap, a collision occurs and both messages are lost. Any node wishing to communicate with another node goes ahead and transmits its message. If another node is transmitting at the same time, or joins in before the message is finished, the message is lost. If the sender does not receive an acknowledgment within a time-out period, it assumes that its message has been corrupted in transmission. 154 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Without any control over when a node may transmit, there’s nothing to stop two or more nodes transmitting simultaneously. The simplest form of contention control would be to let the transmitters retransmit their messages. Such a scheme cannot work, as the competing nodes would keep retransmitting the messages which would keep getting scrambled. A better strategy on detecting a collision is to back-off or wait a random time before trying to retransmit the frame. 155 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements FireWire 1394 Serial Bus The FireWire bus (IEEE 1394 bus) is an example of a bus that began life as a company project and later became an industry standard. Apple developed FireWire in 1986 as a replacement for the then parallel SCSI bus, for use in the professional audio and video world. FireWire is an Apple trademark – the same bus is called iLink by Sony. Several factors have led to the design of the FireWire bus. The first is cost. Over the past four decades microcomputers have been embedded in domestic products from TVs to washing machines. This is particularly true in the audio-visual area. If such devices are to be interconnected, a link must be cheap—the consumer doesn't expect to pay a fortune for an add-on. The second factor is size. Electronic devices are getting smaller and smaller. When computers were large, the space taken by a bulky connector on the back was of little importance. Small devices require correspondingly small connectors (e.g., a hand-held video camera, a computer games console, MP3 player, or cell phone). 156 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements FireWire 1394 Serial Bus The third factor is speed. Year by year the rate at which computers are clocked and the speed at which data is transferred between digital devices has increased relentlessly. In 1975 a 600 baud modem was regarded as fast—by 1995 the 28.8K bps modem was routinely used to interface computers to the Internet, and by 2000 the 512 Kb/s cable modem could be found in many households. The fourth factor is reliability. Systems often fail due to faulty connectors, largely caused by repeated insertion and removal of plugs or the continuous flexing of cables. A reliable connector should have as few signal paths as possible. 157 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements In the early 1990s a serial bus, initially called the P1394 High Performance Serial Bus was proposed (the “P” indicates a provisional standard). A serial bus has many advantages over a parallel bus like the SCSI bus. In particular, a serial bus has only two conductors (one if a fiber optic path is used) which reduces the cost of cabling and the cost and size of connectors. The P1394 bus was designed to take advantage of the best available technology and can support several different physical layers; that is, the P1394 bus is not tied to one type of physical implementation. Some the most important features of the serial bus are: 158 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements • Automatic assignment of node (i.e., the device connected to the bus) addresses—there is no need for address switches or other means of assigning addresses to nodes • Variable-speed data transmission. The IEEE 1394b specification doubled the number of bits per packet to increase the bus rate to 800 Mb/s • The cable medium allows up to 16 physical connections or cable hops, each of up to 4.5 meters • A fair bus access mechanism that guarantees all nodes equal access • Consistent with IEEE Std 1212–1991, IEEE Standard Control and Status Register (CSR) Architecture for Microcomputer Buses (ANSI) • The 1394 Serial Bus limits the number of nodes on any bus to 63. However, up to 216 nodes are supported by means of multiple buses linked via bus bridges. 159 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.64 describes the 1394 Serial Bus’s layered protocol which is essentially the same as that of the PCI Express bus. Each layer provides a specific service. Because each layer communicates with the layer above or below it in a tightly specified manner, it is possible to replace any layer by a system that performs the same function. That is, the 1394 Serial Bus is technology independent. 160 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Serial Bus Topology Data transmission systems are characterized by their topology that describes the way in which the individual nodes are related. The Ethernet is a bus because all nodes are connected to it and information is sent from one node to all other nodes on the bus—there is no routing mechanism to determine how information propagates on the bus. Another system is the ring, in which all nodes are connected to each other and information flows from node to node. 161 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The general structure of the 1394 Serial Bus is described by Figure 12.65. 162 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements IEEE 1394 Hardware The 1394 bus uses a six-element cable. There are two twisted pairs of data conductors to provide two independent data channels as well as two power supply lines. A seventh conductor surrounds the inner six conductors to shield the entire cable electrically. 163 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.66 illustrates a simple tree structure with five nodes. Each node is either a branch, directly connected to more than one neighbor, or it is a leaf with only a single neighbor. Many applications of the serial bus daisy-chain the nodes together, a special case of a tree structure. 164 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Physical Layer The serial bus transmits information in the form of packets in a half-duplex mode. A data strobe, STRB, controls the data flow. Data is transmitted in a NRZ (non-return to zero) format and STRB changes state whenever two consecutive NRZ bits are the same value. This mechanism makes it easy to derive a clock from the data and strobe signals (by exclusiveORing them). Figure 12.67 provides an example of the encoding for the sequence 10110001. 165 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Arbitration The Serial Bus implements several forms of arbitration. Here we describe fair arbitration that occurs on the cable bus. Arbitration is geographic because the node closest to the root on a cable will always win. The fair arbitration protocol is based on the concept of a fairness interval that consists of one or more periods of bus activity separated by short idle periods called subaction gaps followed by a longer idle period known as an arbitration reset gap. At the end of each subaction gap, bus arbitration determines the next node to transmit an asynchronous packet 166 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements When using fair arbitration, an active node can initiate sending an asynchronous packet exactly once in each fairness interval. An active node can arbitrate only if its arb_enable signal is set. The arb_enable signal is set to one by an arbitration reset gap and is cleared when the node wins the arbitration. This disables further arbitration requests for the remainder of the fairness interval. A fairness interval ends when arbitration by the final fair node is successful; this generates an arb_reset_gap since all nodes now have their arb_enable signals reset and cannot drive the bus. The arb_reset_gap re-enables arbitration on all cards and starts the next fairness interval. 167 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements USB Two factors contributing most to the success of the personal digital revolution that encompasses desktop computers, laptops and notebooks, MP3 players and both still and digital cameras as well as cellular phones, are flash memory and the universal serial bus, USB. Flash memory provides robust, high-density, low size, non-volatile storage at low prices, and USB technology allows you to connect almost any modern digital device to a computer – or even to connect two digital devices together without a host computer (e.g., a camera and a printer). Indeed, the USB is the single most successful digital interface ever, with over one billion USB devices sold by 2009. 168 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements USB History USB was developed by a consortium of Compaq, DEC, IBM, Microsoft, NEC, and Nortel in 1994. The first USB specification, 1.0, was introduced in 1996 and supported a data rate of 12 Mb/s. USB 1.1 was released in 1998 to deal with problems related to hubs. USB 1.1 was widely adopted. In 2000 USB 2.0 emerged to provide a maximum data rate of 480 Mb/s. USB had jumped into FireWire territory and it became the de facto standard for most PC interfaces to printers, external drives, keyboards, mice, and so on. It wasn’t until 2009 that version 3.0 USB saw the light of day with an operating speed of 300 MB/s (i.e., 2,400 Mb/s) displacing FireWire from its eight-end niche. USB 3.0 is a giant leap forward over version 2.0 and requires a new cable format and technology. 169 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The Universal Serial Bus was devised a consortium of companies and later established as a standard interface. A USB bus uses low cost connectors and cabling to connect a computer to a range of peripherals from the mouse/keyboard/printer/scanner to memory devices such as external hard-drives and flash memory devices (so called pen drives). USB is an alternative to FireWire. 170 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements USB is host controlled. Unlike other buses such as PCI, it does not support a multimaster arrangement. There can be only one USB master (or host) per bus. Figure 12.70 describes the tiered star topology of a USB system. 171 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements At the top of the hierarchy sits the host which communicates with the computer and controls the USB bus. The host is connected to a hub which is a device that distributes the USB bus to lower levels in the hierarchy. A hub may be connected directly to a peripheral, or to several peripherals, or to another hub. Each hub may be connected to a lower-level hub. 172 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements USB uses a simple multicore connector with dedicated connectors. However, since the introduction of USB, peripherals have become smaller and smaller and USB cables have been forced to follow this trend with the result that there are now four basic sizes. Figure 12.71 illustrates both USB plugs and sockets. The computer end of the link uses type A sockets and type B plugs which are fairly substantial. Type B plugs and sockets are used at the other end of the link (i.e., at a hub or a peripheral such as a printer). Mini-B plugs and sockets were developed for digital cameras, cell phones, and portable disk drives 173 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements 174 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.73 illustrates the structure of a USB system. 175 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements USB cables use four conductors. Data is transmitted differentially between a twisted pair of wires labeled D+ and D- in Figure 12.74. Recall that differential mode transmission increases reliability be rejecting common mode interference. The twisted pair is enclosed in a metal shield to further reduce the dangers of picking up stray signals. The specified maximum length of the cable is 5 meters. Of course, you can use a hub at the end of a 5m cable to increase the length of the USB path. 176 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements More Power The USB’s ability to deliver power was intended to support hubs and peripherals like the mouse and keyboard. In practice, many manufacturers have taken advantage of this facility; for example, the USB’s built in power supply has been used to charge cell phones and MP3 players. A new power mode was added to the USB specification called battery charging. A host can supply up to 1.5A when communicating at 12 Mbps, or 0.9A when communicating at 480 Mbps. Furthermore, by 2010, many of the world’s cell phone manufacturers had provided micro USB ports to charge their phones. 177 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Physical Layer Data Transmission At the electrical level the USB employs NRZI data encoding where a logical 1 is represented by no change of level and a 0 is represented by a change of level. Sending a string of 0s requires the greatest bandwidth and transmitting a sequence of 1s results in a constant level with no signal transitions. Figure 12.75 illustrates a NRZI sequence (this encoding format is very old in comparison with the 8b10b encoding used by the PCIe bus and is relatively little used because it is non-self-clocking and has a dc bias). The two signal levels on a USB bus are referred to as J and K. 178 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Logical Layer Logically, a communications channel exists between the host and device, and in USB-speak this channel is called a pipe. The USB host can support 32 active pipes at any instant (16 up-stream and 16-down-stream pipes). The pipe is terminated at a peripheral by an endpoint. Although the physical structure of a USB system is a tiered star, it is logically a star network. All nodes have to communicate with each other via the single central node. 179 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements On-the-Go USB The USB 2.0 doesn’t support peer-to-peer networking; it covers only host to peripheral communications – the host is usually a computer The On-The-Go USB mode is a peer-to-peer (point-to-point) protocol which allows two USB devices to communicate without a host computer and provides direct device-todevice communications. On-the-go, ONG, operation was adopted in 2001. This extension to USB provided for communications between several OTG devices or between an OTG device and a conventional USB host. This revision also introduced a new range of USB plugs and sockets called Micro-A and Micro-B. The revised standard introduces the dual-role device (i.e., OTG) that can act as either a host or a peripheral. Moreover, a dual-role device must be able to supply a limited current to the bus of at least 8 mA (remember that most ORG devices are battery-powered). OTG devices, when acting as a host, may support only a targeted peripheral list; that is, the OTG device may operate only in conjunction with certain specified peripherals – it is not intended to be a general-purpose host with full USB capabilities. A significant element of ONG technology is the Host Negotiation Protocol, HNP, that allows the transfer of control between two ONG devices. 180 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements USB 3.0 The most radical change to the universal bus took place in 2010 with the introduction of USB 3.0 which provides a tenfold increase in performance and uses less power. Even more remarkably, USB 3.0 is physically compatible with USB 2. USB 3.0 is interesting because it is not really a development or extension of USB 2.0, but a replacement bus that coexists with USB 2.0; that is, a USB 3.0 bus incorporates a USB 2.0 bus as well. 181 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.77 illustrates the structure of the USB 3.0 cable. The two data carrying conductors of USB 2.0 are maintained as well as the two power conductors. Two new differential pairs of conductors (SSRX and SSTX) have been added carry the new USB 3.0 data in a full duplex bidirectional mode. 182 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements The additional functionality of USB 3.0 is called the SuperSpeed bus which provides a maximum speed of 4.8 Gb/s. To put this into context, it takes USB 2.0 13.9 minutes to transfer an HD movie, whereas USB 3.0 can perform the transfer in only 70s. In short, USB 3.0 is an impressive feat of engineering that take a great leap ahead in terms of functionality and performance, while maintaining backward compatibility with a vast existing market of USB 2.0 users. 183 © 2014 Cengage Learning Engineering. All Rights Reserved. Computer Organization and Architecture: Themes and Variations, 1st Edition Clements Figure 12.78 illustrates the logical structure of the USB 3.0 bus in terms of protocol layers (like the ISO 7 layer model for open systems interconnection). The SuperSpeed bus that adds all the new functionality to USB 3.0 is similar to the PCIexpress bus and uses 8b/10b encoding at the physical layer level. Data is scrambled on SuperBus. This is not a security mechanism but a means or converting data into a sequence that appears random in order to improve the electrical properties of the data link. 184 © 2014 Cengage Learning Engineering. All Rights Reserved.