Computer Organization and Architecture: Themes and Variations, 1st Edition
CHAPTER 12
Computer
Organization
and
Architecture
1
© 2014 Cengage Learning Engineering. All Rights Reserved.
Clements
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Input/Output
Input/Output is concerned with the mechanisms by which
information is moved round a computer and between a computer
and peripherals.
2
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.1 describes a generic system with a CPU, I/O controllers and
peripherals, and a system bus that links the CPU to memory and
peripherals.
The word peripheral appears twice in Figure 12.1; it is used both to describe
an external device such as a printer or a mouse connected to a computer,
and it’s used to describe the controller that provides an appropriate interface
between the external peripheral and the CPU
3
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The processor and memory lie at the heart of the system. The peripheral
interfaces, connecting the processor and its memory to peripherals, are
shown in two boxes; one includes internal peripherals, such as disk drives,
and the other includes external peripherals, such as modems, printers,
and scanners.
4
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Memory-mapped Peripherals
There’s no fundamental difference between an I/O transaction and a
memory access.
Outputting a word to a peripheral is the same as storing a word in
memory, and getting a word from a peripheral is exactly the same as
reading a word from memory.
Treating I/O transactions as memory accesses is called memory-mapped
I/O.
This doesn’t mean that we can forget about I/O because it’s just like
accessing memory, since the properties of random access memory are
radically different from the properties of typical I/O systems.
5
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Memory-mapped Peripherals
When implementing I/O structures we have to take into account the
characteristics of the I/O devices themselves; for example, when writing a
file to a disk drive you might have to send a new byte of data every few
microseconds.
Figure 12.3 shows what a typical memory-mapped I/O port (peripheral
interface chip) looks like to the processor.
6
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
To the host CPU this peripheral appears as the sequence of consecutive
memory locations described by Figure 12.4.
The left-hand side of the peripheral interface shaded gray in Figure 12.3 looks
exactly like a memory element as far as the CPU is concerned.
The other half of the peripheral interface chip, shown in blue, is the peripheral
side that performs the specific operations required by the interface.
7
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The memory-mapped port of Figure 12.4 has four consecutive registers at
addresses i, i + 1, i + 2, and i + 3.
We have assumed that the peripheral is an 8-bit device and that its
consecutive locations are each separated by one byte.
In a system with a 32-bit data bus, the addresses of the registers would be
i, i + 4, i + 8, and i + 12.
The first location at address i contains a command register that defines
the operating mode and characteristics of the peripheral.
Most memory-mapped I/O ports can be configured to operate in several
modes, according to the specific application.
8
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The location at address i + 1 contains the port’s status, which is set up by
the associated peripheral. This status information can be read by the
processor to determine whether the port is ready to take part in a data
transaction or whether an error condition exists; for example, a printer
connected to a memory-mapped I/O port might set an error bit to indicate
that it is out of paper. In this example we’ve created generic status bits
such as ERRout, ERRin, RDYout, RDYin.
The locations at addresses at i + 2 and i + 3 are the addresses used to send
data to the peripheral, or receive data from the peripheral.
9
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Peripheral Register Addressing Mechanisms
The command and data-to-peripheral registers are write-only, and the
status and data-from-peripheral registers are read-only.
A single address line can distinguish between two pairs of registers (i.e.,
command/status, the data in/data out). The processor’s read and write
signals distinguish between the read-only and write-only registers.
Table 12.1 demonstrates this register-addressing scheme. The peripheral
provides four internal registers, but the processor sees only two unique
locations, N and N + 4.
The CPU’s R/W* output is used to select one of two pairs of registers.
When R/W* = 0, the write-only registers are selected and when R/W* = 1,
the read-only registers are selected.
Figure 12.5 emphasizes the way in which peripheral register space can be
divided into read-only and write-only regions.
10
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Register address Function
CPU address R/W
i
i+1
i+2
i+3
N
N+4
N
N+4
status
data out
control
data in
1
1
0
0
11
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.6 illustrates a register file addressed by a counter. After the
peripheral interface has been reset, the internal pointer is loaded with zero.
Each successive access to the interface increments the pointer and selects
the next register. Peripherals with auto-incrementing pointers are useful
when the registers will always be accessed in sequence..
12
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Peripheral Access and Bus Width
Many peripherals have 8-bit wide buses and are interfaced to computers
with 16 or 32 bits.
Life is easy when 8-bit peripherals are connected to 8-bit data buses with 8bit processors, or when 16-bit peripherals are connected to 16-bit buses with
16-bit processors.
Things get more complicated when 8-bit peripherals are interfaced to 16- or
32-bit buses.
13
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Two problems can arise when you interface an 8-bit peripheral to a 16-bit bus;
endianism and the mapping of 8-bit registers onto a processor’s 16-bit address
space.
Consider the arrangement in Figure 12.7 where an 8-bit peripheral is
interfaced to a 16-bit bus. The peripheral is connected to half the bus’s data
lines.
If the processor supports 8-bit bus transactions , all is well and the registers
can be accessed at their byte addresses (at byte offsets 0, 1, 2, and 3).
14
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
If the processor supports only 16-bit bus operations, when a 16-bit value is
written to memory all 16-bits are put on the data bus.
When the processor performs a byte access, it still carries out a word
access but informs the processor interface or memory that only 8-bits are to
be transferred.
A separate control or address signal is required to specify whether the byte
being accessed is the upper or lower byte at the current address.
15
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In this case, the peripheral is hard-wired to one half of the data bus and
can respond only to either odd or even byte addresses.
In a big-endian environment, the peripheral would be wired to data lines
[0:7] and accessed at the odd address, whereas in a little-endian
environment the peripheral would be wired to data lines [0:7] and accessed
at even addresses.
The peripheral’s four addresses would appear to the computer at byte
offsets of 0, 2, 4, and 6.
16
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Some processors have dedicated instructions to facilitate data transfer to
byte-wide peripherals; for example, the 32-bit 68K has a MOVEP, move
peripheral, instruction that copies 16- or 32-bit value to or from an 8-bit
memory-mapped peripheral.
Figure 12.8 shows a peripheral with four internal registers and a the
CPU’s address map, where the peripheral’s data space is mapped onto
successive odd addresses in this big-endian processor's memory space.
17
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.9 shows a peripheral with four 8-bit registers. The registers
appear to the programmer as locations $08 0001, $08 0003, $08 0005, and
$08 0007. Locations $08 0000, $08 0002, $08 0004, and $08 0006 cannot be
accessed.
MOVEP moves a 16/32-bit value between a register and a byte-wide
peripheral. The contents of the register are moved to consecutive even (or
odd) byte addresses; for example, MOVEP.L D2,(A0) copies the four bytes
in register D2 to addresses: [A0] + 0, [A0] + 2, [A0] + 4, [A0] + 6, where A0
is a pointer register.
18
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.10 demonstrates how a MOVEP.L D0,(A0) copies four bytes in
D0 to successive odd addresses in memory, starting at location 08 000116.
The suffix .L in 68K code indicates 32-bit operation and .B indicates a byte
operation.
The most-significant byte in the data register is transferred to the lowest
address.
19
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Without the MOVEP instruction, it would take the following code to move
four bytes to a memory-mapped peripheral.
MOVE.L #Peri,A0
MOVE.B D0,(6,A0)
ROR.L #8,D0
MOVE.B D0,(4,A0)
ROR.L #8,D0
MOVE.B D0,(2,A0)
ROR.L #8,D0
MOVE.B D0,(0,A0)
ROR.L #8,D0
;A0 points to the memory-mapped peripheral
;Move least-significant byte of D0 to the peripheral
;Rotate D0 to get the next 8 bits
;Move the next byte, bits 8 to 15, to the peripheral
;and so on…
After four rotations D0 is back to its old value
20
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Preserving Order in I/O Operations
RISC architectures provide only memory load and store operations and don’t
implement instructions that facilitate I/O operations.
However, there are circumstances where RISC organization and memorymapped I/O clash.
Some memory-mapped peripherals have configuration and self-resetting
status registers or autoincrementing pointers.
It’s important to access such peripherals in the appropriate programmerdefined sequence.
Because superscalar RISC processors take an opportunistic approach to
memory access, data can be stored in memory out-of-order.
Such out-of-order memory accessing doesn’t cause problems with data
storage and retrieval, but it can disrupt memory-mapped I/O.
© 2014 Cengage Learning Engineering. All Rights Reserved.
21
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The PowerPC implements an EIEIO (enforce in-order execution of I/O)
instruction that has no parameters but ensures that all memory accesses
previously initiated are completed. Consider this example where two loads
are followed by an addition.
lwz r5, 1000(r0)
lwz r6, 1040(r0)
add r7, r5, r6
;load r5 from memory[1000]
;load r6 from memory[1040]
;r7 = r5 + r6
When these instructions are executed, the processor may swap the order in
which r5 and r6 are loaded from memory. As long as the first two loads are
executed before the add instruction, the outcome is not dependent on the
order of the loads.
Addresses 1000 and 1040 are memory-mapped locations. If the peripheral is
designed so that a read access to address 1000 updates a register at 1040,
the sequence of the two load instructions becomes all-important and
reversing their order may lead to an incorrect result.
22
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Consider the following example where we have to update a peripheral.
Because the register is accessed via a pointer, we write the register address
to the peripheral’s pointer register before writing data to the register being
pointed at.
In this example, we want to load peripheral register number 35 with the
value 99. The PowerPC code is:
addi r5, r0, 35
addi r6, r0, 99
stw r5, 1234(r0)
stw r6, 5678(r0)
;r5 = 35
;r6 = 99
;store 35 at memory location 1234 (the pointer)
;store 99 at memory location 5678
The two writes must be executed in the correct order. To ensure this, the
PowerPC has three synchronization instructions, eieio, sync, and isync.
The isync forces instructions or memory transactions to complete before
continuing; that is, instructions prior to isync are executed and fetched
instructions are discarded.
23
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Then, a new fetch begins. EIEIO forces all posted writes to complete prior to
any subsequent writes.
The SYNC instruction forces all previous reads and writes to complete on
the bus before executing any instructions after it.
We can ensure that the previous code runs in the correct order by inserting
an EIEIO between the writes.
addi r5,r0,35
addi r6,r0, 99
stw r5,1234(r0)
eieio
stw r6,5678(r0)
;r5 = 35
;r6 = 99
;M[1234] = 35; we're changing register 35
;Make sure r5 is written before proceeding
;M[5678] = 99; new register value is 99
24
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Data Transfer
Three concepts are vital to an understanding of data transfer: open- and
closed-loop transfers, and data buffering.
In an open-loop transfer, information is sent on its way and its correct
reception is assumed.
In a closed-loop transfer, the receiver actively acknowledges that the
data has arrived.
Data buffering is concerned with handling disparities between the rate
at which data is transmitted and the rate at which it is consumed by the
receiver.
25
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Open-loop Data Transfers
The simplest method of transmitting data is to put the data on a bus and
assert a signal, data strobe, to indicate that it is available.
Figure 12.11 illustrates an open-loop transmission between a peripheral
interface component and an external peripheral (e.g., a printer).
The processor moves data to the peripheral interface with its address and
data buses and the peripheral interface puts the data on the bus.
26
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The peripheral interface asserts a data available strobe, DAV*, to indicate
to the peripheral that the data at its input terminal is valid.
The peripheral reads the data and the peripheral interface negates its
DAV* strobe to complete the transfer.
27
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.12 provides a timing diagram for this information exchange
which is called is open loop because there is no feedback to acknowledge
that the data has indeed been received.
If the peripheral is off line, busy, or just very slow, the data may not be
read during the time for which it is available (i.e., DAV* asserted). Open
loop data transfers are also called synchronous transfers because the
device receiving the data must be synchronized with the device sending
the data.
28
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Closed-loop Data Transfers
In a closed loop transfer the device receiving data returns an acknowledgment
to the sender to close the loop.
DAV* (data available) from the peripheral indicates the receipt of data.
29
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The peripheral interface makes the data available and asserts DAV* at B to
indicate that the data is valid just as in an open-loop data transfer.
The peripheral receiving the data sees DAV* asserted and reads the data. In
turn the peripheral asserts ACK* to inform the interface that the data has
been accepted.
The interface de-asserts DAV* to complete the exchange. This sequence is
known as handshaking. Handshaking supports slow peripherals, because the
transfer waits until the peripheral indicates its readiness by asserting ACK*.
30
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The timing diagram in Figure 12.14 is called a handshake because the
assertion of ACK* is a response to the assertion of DAV*.
The advantage of a closed loop data transfer is that the originator of the
data knows that it has been accepted and data cannot be lost because it
was not read by the remote peripheral.
31
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The handshaking closed-loop protocol can be taken a step further. The
assertion of DAV* is met by the assertion of ACK* from the peripheral.
At this point it is assumed that the data has been received and the data
exchange ends.
Figure 12.15 shows a fully interlocked handshake in which the sequence
of events is more tightly defined and each event triggers the next event
in sequence.
32
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
At B in Figure 12.15 DAV* is asserted to indicate valid data and at C ACK*
is asserted to indicate its receipt. The sequence continues with the negation
of DAV* at point D.
DAV* can be negated because the assertion of ACK* indicates that DAV*
has been recognized.
Negating DAV* indicates that its acknowledgement has been detected. The
peripheral negates ACK* at E and removes the data at F after negating
DAV*. Point F may come before point E because the removal of the data is a
response to the negation of DAV* rather than to the negation of ACK*.
33
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Buffering Data
When data is transmitted over a bus, you either have to use it while it is
valid, or capture it in a memory device. Figure 12.16 illustrates three input
circuits.
Figure 12.16(a) uses the instantaneous values on data inputs I0 to I3; that is,
the current data values are used and it is necessary for the transmitter to
maintain the data values while they are being used.
34
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.16(b) illustrates single-buffered input using D flip-flops. When the
data is to be read, the flip-flops are latched and the input captured.
Single-buffered input captures data and holds it until the next time the
latches are clocked.
35
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.16(c) provides a solution to the problem where new data arrives
before the previous value has been read. Incoming data is latched exactly as
before.
Data in the input latches is copied to a second set of latches, where it is
buffered for a second time. The input side of the buffer can be capturing data
while the output side is waiting for the old data to be read.
36
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.17 gives the timing diagram of a double-buffered input system.
The input arrives at fixed time intervals. Input samples are clocked into the
input latches at regular intervals by clock CIi, where i is the clock pulse
number.
37
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The FIFO
A general solution to data buffering is provided by the first-in-first-out, FIFO,
memory.
Data is written into a FIFO queue one value at a time and read out in the
same order.
Once the data has been read it cannot be accessed again.
A FIFO can be empty, partially filled, or full; they usually have output flags
to indicate fully empty or partially full.
38
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The simplest FIFO structure is a register with an input port that receives the
data and an output port. The data source provides the FIFO input and a
strobe. Similarly, the reader provides a strobe when it wants data.
Figure 12.18 describes a FIFO, FULL indicates that no more data can be
accepted and EMPTY indicates that no more data can be read.
When data arrives at the input terminals, it ripples down the shift register
until it arrives at the next free location.
39
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.19 demonstrates a 10-stage FIFO as data is added and removed.
40
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The FIFO is usually built around a random access memory element, that is
arranged as a circular buffer. A read pointer and a write pointer keep track
of the data in RAM.
41
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.21 illustrates the structure of a dual-port RAM FIFO.
The advantage of RAM-based FIFOs over register-based FIFOs is that the
fall-through time of a RAM-based FIFO is constant and independent of its
length.
42
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.22 demonstrates the use of a typical FIFO in a system with a 32bit computer using little-endian I/O and an 8-bit port using big-endian I/O.
This FIFO is user-configurable and can be set up to perform bus matching;
that is its input and output buses may have different widths.
Its port A interface is 32 bits wide and its port B interface is 8 bits wide.
You can program it to perform the byte swapping required when data is
copied from a little endian to a big endian system.
43
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.23 gives the timing diagram for the case when two 32-bit words
are read into the FIFO and eight 8-bit byes are read from it.
44
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
I/O Strategy
A computer implements I/O transactions in one of three ways.
• it can perform an individual I/O transaction at the point the operation is
needed by programmed I/O
• it can execute another task until a peripheral signals its readiness to
take part in an I/O transaction by interrupt-driven I/O
• it can ask special-purpose hardware to perform the I/O transaction by
direct memory access, DMA, hardware. Computer systems may employ a
mixture of these strategies.
45
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Programmed I/O
A typical memory-mapped peripheral has a flag bit that is set by the
peripheral when it is ready to take part in a data transfer.
In programmed I/O the computer interrogates the peripheral’s status
register and proceeds when the peripheral is ready.
We can express this operation in pseudocode as:
REPEAT
Read peripheral status
UNTIL ready
Transfer data to/from peripheral
46
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Programmed I/O
The operation “REPEAT Read peripheral status UNTIL ready” constitutes a
polling loop, because the peripheral’s status is continually tested until it is
ready to take part in the I/O transaction. In the following example, status bit
RDY is set if the peripheral has data. If we take the I/O model of Figure 12.4
and translate the pseudocode into generic assembly language form to perform
an input operation, we get
ADR
MOV
STR
Rpt1 LDR
AND
BEQ
LDR
r1,i0
r2,#Command
[r1],r2
r3,[r1,#2]
r3,r3,#1
Rpt1
r3,[r1,#4]
;Register r1 points to the peripheral
;Define peripheral operating mode
;Set up peripheral. Load the command
;Read input status word into r3
;Mask status to RDYIN bit
;Repeat until device ready
;Read the data into r3.
47
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Interrupt-driven I/O
A more efficient I/O strategy uses an interrupt handling mechanism to deal
with I/O transactions when they occur.
The processor carries out another task until a peripheral requests attention.
When the peripheral is ready, it interrupts the processor, carries out the
transaction and then returns the processor to its pre-interrupt state.
48
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The two peripheral interface components are each capable of requesting the
processor’s attention.
All peripherals have an active-low interrupt request output, IRQ*, that runs
from peripheral to peripheral, and is connected to the processor’s IRQ* input.
49
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Active-low means that a low voltage indicates the interrupt request state.
The reason that the electrically low state is used as the active state is
entirely because of the behavior of transistors; that is, it is an engineering
consideration that dates back to the era of the open-collector circuit that
could only pull a line down to zero.
50
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Whenever a peripheral wants to take part in an I/O transaction, it asserts
its IRQ* output and drives the IRQ* input to the CPU active low.
The CPU detects that IRQ* has been asserted and responds to the
interrupt request if it has not been masked.
51
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Most processors have an interrupt mask register that allows you to turn off
interrupts if the CPU is performing an important operation.
Interrupts may be masked when the processor is performing a critical task;
for example, a system using real-time monitoring of fast events would not
defer to a keyboard input interrupt (even a fast typist is glacially slow
compared to a computer’s internal operation).
Similarly, recovery from a system failure such as a loss of power will be
given priority.
52
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The way in which a processor responds to an interrupt is device-dependent.
The two peripherals in Figure 12.24 are wired to the common IRQ* line
and the CPU can’t determine which device interrupted.
The CPU identifies the interrupting device by polling each peripheral’s
status register until the interrupter has been located.
Interrupt polling provides interrupt prioritization because important
devices whose interrupt requests must be answered rapidly are polled first.
In Figure 12.24 each memory-mapped peripheral has an interrupt vector
register, IVR, that tells the processor how to find the appropriate interrupt
handler.
Typically, the IVR supplies a pointer to a table of interrupt vectors.
53
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Interrupt Processing
When an interrupt occurs, the computer first decides whether to service it
or whether to ignore it. When the computer responds to the interrupt, it
carries out the following sequence of actions.
• It completes the current instruction.
• The contents of the program counter are saved to allow the program to
continue from the point at which it was interrupted.
• The state of the processor must also be saved. A processor’s state is
defined by the flag bits of the condition code, plus other status
information.
• A jump is then made to the location of the interrupt handling routine,
which is executed like any other program.
• After this routine has been executed, a return from interrupt is made,
the program counter restored, and the system status word returned to
its pre-interrupt value.
54
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.25 shows how a typical CISC responds to an interrupt request.
Stack PSR indicates that the processor status register is pushed on the stack.
The interrupt is transparent to the interrupted program and the processor is
returned to the state it was in immediately before the interrupt took place.
55
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Nonmaskable Interrupts
An interrupt request may be denied or deferred.
Some microprocessors have a nonmaskable interrupt request, NMI, that
can’t be deferred.
A nonmaskable interrupt is reserved for events such as a loss of power.
The NMI handler routine forces the processor to deal with the interrupt and
to perform an orderly shutdown of the system, before the power drops below
a critical level and the computer fails completely.
56
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Prioritized Interrupts
Microprocessors often support prioritized interrupts (i.e., the chip has more
than one interrupt request input).
Each interrupt has a predefined priority and a new interrupt with a priority
lower than or equal to the current one cannot interrupt the processor until
the current interrupt has been dealt with.
Equally, an interrupt with a higher priority can interrupt the current
interrupt.
57
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Nested Interrupts
Interrupts and other processor exceptions have all the characteristics of a
subroutine, the return address is stacked at the beginning of the call and
then restored once the subroutine has been executed to completion.
The interrupt is a subroutine call with an automatic target address supplied
in hardware or software and a mechanism that preserves the state of the
condition code as well as the program counter.
Just as subroutines can be nested, so can interrupts.
58
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.26 demonstrates nested interrupts.
A level 1 interrupt occurs a second time. A level 2 interrupt takes place before
the level 1 interrupt handler has completed its task. The level 1 interrupt
handler is interrupted and the level 2 interrupt processed. Once the level 2
interrupt has been dealt with, a return is made to the level 1 interrupt handler.
59
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Example of a sequence of nested interrupts
60
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Vectored Interrupts
When a processor with a single interrupt request line detects a request for
service, it doesn’t know which device made the request and can’t begin to
execute the appropriate interrupt handler until it has identified the source of
the interrupt.
A vectored interrupt solves the problem of identifying the source by forcing the
requesting device to identify itself to the processor.
Without vectored interrupts, the processor must examine each of the
peripherals’ interrupt status bits.
When the processor detects an interrupt request it broadcasts an interrupt
acknowledge to all potential interrupters.
Each possible interrupter detects the acknowledge from the CPU and the
interrupting device returns a vector that is used by the CPU to invoke the
appropriate interrupt handler.
© 2014 Cengage Learning Engineering. All Rights Reserved.
61
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.28 demonstrates the 68K prioritized, vectored interrupts. There are
7 levels of interrupt request. Level i is serviced in preference to level j, if i > j.
The scheme permits nested interrupts. An interrupt at level i can be
interrupted by a new interrupt at level j if j > i.
62
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Direct Memory Access
The most sophisticated means of dealing with IO uses direct memory access,
DMA, in which data is transferred between a peripheral and memory without
the active intervention of a processor.
In effect, a dedicated processor performs the I/O transaction by taking control
of the system buses and using them to move data directly between a
peripheral and the memory.
DMA offers a very efficient means of data transfer because the DMA logic is
dedicated to I/O processing and a large quantity of data can be transferred in
a burst; for example, 128 bytes of input.
63
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.29 describes a system that uses DMA to transfer data to disks. A
DMA controller, DMAC, controls access to the data bus. The DMA controller
must first be loaded with the destination of the data in memory and the
number of bytes to be transferred; that is, you have to program the DMA
controller before it can be triggered.
64
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Three bus switches control access to the data bus by the CPU, memory, and
DMA controller. A bus switch is turned on or off to enable or disable the
information path between the bus and the device interfaced to the bus switch.
Normally, the CPU bus switch is closed and the DMAC and peripheral bus
switches are open. The CPU transfers data between memory and itself by
putting an address on the address bus and reading or writing data.
65
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.30a illustrates the situation in which the CPU is controlling the
buses and Figure 12.30b demonstrates how the DMA controller takes control
of the data bus to perform the data transfer itself.
66
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Bus
Bus is a contraction of the Latin omnibus that means for all.
A behaves like a highway that is used by multiple devices. In a computer, all
the devices that wish to communicate with each other use a bus.
Figure 12.31 illustrates the organization of a computer with three buses.
67
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The system bus of is made up of the address, data and control paths from
the CPU. Memory and memory-mapped I/O devices are connected to this
bus. Such a bus has to be able to operate at the speed of the fastest device
connected to it.
The system bus demonstrates that a one size fits all approach does not
apply to computer design because it would be hopelessly cost-ineffective to
interface low-cost, low-speed peripherals connected to a high speed bus.
68
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In systems with more than one CPU (or at least more than one device that
can initiate data transfer actions like a CPU) the bus has to decide which of
the devices that want to access the bus should be granted access to it.
This mechanism is called arbitration and is a key feature of modern system
buses.
A device that can take control of the system bus is called a bus master, and a
device that can only respond to a transaction initiated by a remote bus
master is called a bus slave.
69
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In Figure 12.31, the CPU is a bus master and the memory system a bus slave.
One of the I/O ports has been labeled bus master because it can control the
bus (e.g., for DMA data transfers), whereas the other peripheral is labeled bus
slave because it can respond only to read or write accesses.
The connection between the disk drive and its controller is also labeled bus
because it represents a specialized and highly dedicated example of the bus.
70
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Bus Structures and Topologies
A simple bus structure is illustrated by the CPU plus memory plus local bus
in Figure 12.32. Only one device at a time can put data on the data bus.
Data is transferred between CPU and memory or peripherals. The CPU is
the permanent bus master and only the CPU can put data on the bus or
invite memory/peripherals to supply data via the bus.
71
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.33 illustrates a bus structure that employs two buses linked by an
expansion interface.
Each of these separate bus systems may have entirely different levels of
functionality; one might be optimized for high-speed processor-to-memory
transactions, and the other to support a large range of plug-in peripherals.
72
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Bus Speed
Suppose device A transmits data to device B. Let’s go through the sequence of
events that take place when device A initiates the data transfer at t = 0.
Initially, A drives data onto the data bus at time td, the delay between device
A initiating the transfer and the data appearing on the bus. Data propagates
along the bus at about 70% of the speed of light or about 1 ft/ns.
73
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
When the data reaches B, it must be captured.
Latches are specified by their setup and hold times. The data setup time, ts, is
the time for which the data must be available at the input to system B for it to
be recognized. The data hold, th, time is the time for which the data must
remain stable at system B’s input after it has been captured.
74
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The time taken for a data transfer, tT, is, therefore, tT = td + tp + ts + th.
Inserting typical values for these parameters yields 4 + 1.5 + 2 + 0 = 7.5 ns,
corresponding to a data transfer rate of 1/7.5 ns = 109/7.5 = 133.3 MHz.
A 32-bit-wide bus can transfer data at a maximum rate of 533.2 MB/s.
In practice, a data transfer requires time to initiate it, called the latency, tL.
Taking latency into account gives a maximum data rate of 1/(tT + tL).
Higher data rates can be achieved with pipelining, by transmitting the next
data element before system B has completed reading the previous element.
75
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.35 demonstrates the application of pipelining to the previous
example.
Data must be stable at the input to system B for at least ts + th seconds; then
a new element may replace the previous element.
Pipelining allows an ultimate data rate of 1/(ts + th).
76
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Address Bus
Some systems have an explicit address bus that operates in parallel with the
data bus. When the processor writes data to memory, an address is
transmitted to the memory system at the same time the data is transmitted.
Some systems combine address and data buses together into a single
multiplexed bus that carries both addresses and data (albeit alternately).
77
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.36 describes the multiplexed address/data bus which requires
fewer signal paths and the connectors and sockets require fewer pins.
Multiplexing addresses and data onto the same lines requires a multiplexer
at one end of the transmission path and a demultiplexer at the other end.
Multiplexed buses can be slower than non-multiplexed buses and are often
used when cost is more important than speed.
78
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The efficiency of both non-multiplexed and multiplexed address buses can be
improved by operating in a burst mode in which a sequence of data elements
is transmitted to consecutive memory addresses.
Burst-mode operation is used to support cache memory systems.
Figure 12.37 illustrates the concept of burst mode addressing where an
address is transmitted for location i and data for locations i, i+1, i+2, and i+3
are transmitted without a further address.
79
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Control Bus
The control bus regulates the flow of information on the bus. Figure 12.38
describes a simple 2-line synchronous control bus that uses a data-direction
signal and a data validation signal. The data direction signal is R/W* and is
high to indicate a CPU read operation and low to indicate a write operation.
80
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Some systems have separate read and write strobes rather than a R/W*
signal. Individual READ* and WRITE* signals indicate three states: an active
read state, an active write state, and a bus free state (READ* and WRITE*
both negated). A R/W* signal introduces ambiguity because when R/W* = 0
the bus is always executing a write operation, whereas when R/W* = 1
indicates a read operation or the bus is free.
The active-low data valid signal, DAV*, is asserted by the bus master to
indicate that a data transfer is taking place.
81
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Let’s look at an example of an asynchronous data transfer, a processor
memory read cycle.
Figure 12.39 provides the simplified read cycle timing diagram of a 68020
processor. The processor is controlled by a clock, CLK, and the minimum bus
cycle takes six clock states labeled S0 to S5.
82
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Arbitrating for the Bus
In a system with several bus masters connected to a bus, a mechanism is
needed to deal with simultaneous bus requests. The process by which
requests are recognized and priority given to one of them is called arbitration.
There are two approaches to dealing with multiple requests for a bus—
localized arbitration and distributed arbitration.
In localized arbitration, an arbitration circuit receives requests from the
contending bus masters and then decides which of them is to be given control
of the bus.
In a system with distributed arbitration, each of the masters takes part in the
arbitration process and the system lacks a specific arbiter—each master
monitors the other masters and decides whether to continue competing for the
bus or whether to give up and wait until later.
83
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Localized Arbitration and the VMEbus
The VMEbus supports several types of functional modules.
We are interested in the bus master that controls the bus, the bus requester
that requests the bus, and the arbiter that grants the bus to a would-be
master.
A bus requester is employed by a bus master when it wants to access the
VMEbus.
A VMEbus is usually housed in a box with a number of slots into which
modules can be plugged (rather like the slots used by the PCI bus).
84
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The VMEbus’s arbitration bus is described in Figure 12.40. A bus requester
uses BR0* to BR3* (bus request 0 to bus request 3) to indicate that the bus
master wants the bus. Four bus grant lines are used by the arbiter to grant
control of the bus to the requester. Bus clear (BCLR*) and bus busy (BBSY*)
control the arbitration process.
85
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The VMEbus arbiter is located in a special position on a VMEbus, slot 1. All
bus request lines run the length of the VMEbus and any would-be master
can place a request on one of these lines. The level of the request is userdetermined; that is, the user decides which of the four bus request lines are
to be connected to a module’s request output.
86
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The arbiter reads the bus request inputs from all the slots along the bus,
decides which request is to be serviced, and then informs other modules of
its decision via its bus grant outputs.
87
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The VMEbus supports four levels of arbitration. We will soon see that each
of these four levels can be further subdivided. The bus request lines run the
length of the VMEbus and terminate at the arbiter in slot 1
88
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
When one or more bus requesters wish to access the VMEbus, they assert the
bus request lines to which they have been assigned; for example the card in
slot three might assert bus request line BR1* and the card in slot 5 might
assert bus request line BR3*. The arbiter in slot decides whether one of them
is to succeed.
89
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
If a request on, say, BR2* is successful, the arbiter sends a bus grant message
on its level 2 bus grant output, BG2OUT*. We will write BGxIN*, BGxOUT*
and BRx* where x is 0 to 3 to avoid referring to specific levels.
90
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The BGx* lines do not run along the entire length of the bus. Instead the
VMEbus employs a chain of lines called bus grant out and bus grant in.
The BGxIN * and the BGxOUT run from slot-to-slot rather than from
end-to-end. A BGxOUT * line from a left-hand module is passed out on its
right as a BGxIN * line. Therefore, the BGxOUT * of one module is connected
to the BGxIN * of its right-hand neighbor. The arrangement is called
daisy-chaining.
A continuous bus line transmits a signal in both directions to all devices
connected to it. The daisy-chained line is unidirectional, transmitting a signal
from one specific end to the other.
Each module connected to (i.e., receiving from and transmitting to) a
daisy-chained line may either pass a signal on down the line or inject a signal
of its own onto the line.
91
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In Figure 12.42 the requester in slot j requests the bus at level 1 when no other
device is requesting the bus. When BR1* is asserted, the arbiter detects it and
asserts BG1OUT*, which passes down the bus until it reaches slot j. The
arbiter in slot 1 sends a bus grant input to the card in slot 2. The card in slot 2
takes this bus grant input and passes it on as a bus grant output to the card in
slot 3, and so on.
.
92
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Each card receives a bus grant input from its left hand neighbor and may or
may not pass it on as a bus grant output to its right hand neighbor.
A card might choose to terminate the daisy chain signal-passing sequence
and not transmit a bus grant signal to its right hand neighbor. If a slot is
empty, bus jumpers (i.e., links) must be provided to route the appropriate
BGxIN* signals to the corresponding BGxOUT* terminals.
93
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
A requester module makes a bid for control of the system data transfer bus
by asserting one of the bus request lines, BR0* to BR3*.
Only one line is asserted and the actual line is chosen by assigning a given
priority to the requester. This priority may be assigned by on-board
user-selectable jumpers or dynamically by software.
94
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The arbiter in slot 1 asserts a BGxOUT* line, and a bus grant propagates
down the daisy-chain. Each BGxOUT* arrives at the BGxIN* of the next
module. If that module doesn’t want the bus, it passes on the request on its
BGxOUT*. If the module requested the bus, it takes control of the bus. Daisy
chaining provides automatic prioritization, because bus requesters nearer the
arbiter win the arbitration—this is called geographic prioritization.
95
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.43 provides a protocol flowchart for VMEbus arbitration. Initially, a
bus master in slot M at a priority less than i is in control of the bus. This
current bus master asserts the bus busy signal, BBSY*, that runs the length of
the bus. As long as any master is asserting BBSY* no other master may
attempt to gain control of the VMEbus. An active bus master in a VMEbus
cannot be forced off the bus.
96
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
97
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Suppose a bus requester in slot N requests the bus at a priority higher level
than the current master. The arbiter detects the new higher level and asserts
its bus clear output which informs the current master that another higher
priority device wishes to access the bus,
98
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The current master does not have to relinquish the bus within a prescribed
time limit. Typically, it will release the bus at the first convenient instant by
negating BBSY*. The VMEbus provides both geographic prioritization
determined by a slot’s location and an optional prioritization by bus request.
99
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
BCLR* is driven only by arbiters that permanently assign fixed priorities to
the bus request lines. Other arbitration mechanisms, such as the round robin
arbitration scheme to be described later, have no fixed priority and the arbiter
does not make use of the bus clear line.
When the arbiter detects that the current master has released the bus, the
arbiter asserts BGiOUT* to indicate to the requester at level i that it has
gained control of the bus. The arbiter knows only the level of the request and
not which slot it came from.
The bus grant message ripples along the bus, entering each module as BGiIN*
and leaving as BGiOUT*. When this message reaches the requester in slot N
that made the request at level i, the message is not passed on.
Instead, the requester asserts BBSY* to show that it now has control of the
bus.
What would have happened if a requester also at level i but located nearer to
the arbiter than slot N had also requested the bus at approximately the same
time? The answer is that the requester closer to the arbiter would have
100
received the bus grant first and have taken control of the bus.
.
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Releasing the Bus
The requester may implement one of two options for releasing the bus; option
RWD, release when done, and option ROR, release on request.
Option RWD requires the requester to release the bus as soon as the on-board
master stops indicating bus busy; that is, the master remains in control of the
bus until its task has been completed, which can lead to undue bus hogging.
The ROR option is more suitable in systems in which it is unreasonable to
grant unlimited bus access to a master. The ROR requester monitors the four
bus request lines.
If it sees that another requester has requested service, it releases its BBSY*
output and defers to the other request.
The ROR option also reduces the number of arbitrations requested by a
master, as the bus is frequently cleared voluntarily.
.
101
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Arbitration Process
Figure 12.44 demonstrates what happens when two requesters at different
levels of priority request the bus. Both requesters A and B assert their bus
request outputs simultaneously.
Assuming that the arbiter detects BR1* and BR2* low, the arbiter asserts
BG2IN* on slot 1, because BR2* has a higher priority. When the bus grant has
propagated down the daisy-chain to requester B, requester B will respond to
BG2IN* by asserting BBSY*. Requester B then releases BR2* and informs its
own master that the VMEbus is now available.
102
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
103
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
VMEbus Arbitration Algorithms
Three strategies that the arbiter in slot 1 can be used prioritizate the bus
request.
1. Option RRS (round robin select) The RRS option assigns priority to
the masters on a rotating basis. Each of the four levels of bus request
has a turn at being the highest level.
2. Option PRI (prioritized) The PRI option assigns a level of priority to
each of the bus request lines from BR3* (highest) to BR0*.
3. Single level (SGL) The SGL option provides a minimal arbitration
facility using bus request line BR3* only. The priority of individual
modules is determined by daisy-chaining, so that the module next to the
arbiter module in Slot 1 of the VMEbus rack has the highest priority. As
the position of a module moves further away from the arbiter, its
priority reduces.
104
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Distributed Arbitration
Not all buses use a centralized arbiter to decide which of the competing
bus masters is to get control of the bus.
A mechanism called distributed arbitration allows arbitration to take
place simultaneously at all slots along the bus.
We now describe a backplane bus that supports distributed arbitration,
the NuBus, a general-purpose synchronous backplane bus with
multiplexed address and data lines that is also known as ANSI/IEEE
STD 1186-1988.
It was conceived at MIT in 1970 and later supported by Western Digital
and Texas Instruments (1983). Apple implemented a subset of NuBus in
their Macintosh II.
105
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The key to NuBus arbitration is each module’s unique slot number that
ranges from 016 to F16.
When a card in a slot arbitrates for the bus, the card places its slot
number on the bus and, as if by magic, any other requester with a lower
slot number strops arbitrating for the bus.
Equally, if a slot with a higher number wants the bus, the requesting
slot stops requesting the bus; that is, if a card arbitrates for the bus and
then finds that a card with a higher priority is also arbitrating for the
bus, it backs off.
106
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
To appreciate how distributed arbitration works, you have to understand the
open-collector gate.
Historically, the open-collector gate precedes the tristate gate and is used to
allow more than one device to drive the same bus.
Figure 12.45 illustrates an inverter with an open-collector output. The gate’s
output can be actively forced to a low voltage. When the input of the gate is
1, the internal transistor switch is closed and its output is forced low just
like a normal inverter.
107
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
When the input is 0, the transistor switch is open and the output of the
open-collector gate is left floating because it is internally disconnected from
the high- or low-level power rails.
That is, the open-collector gate has an active-low output state and a floating
state and can pull a bus down into a low state, but it can’t pull the bus up
into a high state.
108
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.46 illustrates the key circuit used in a distributed arbiter that has
an input X and an output Y.
The circuit is also connected to one of the arbitration control lines on the bus.
In what follows, we are interested in the relationship between the circuit and
the state of the bus.
If you use Boolean algebra, you will see that output Y is 0 for any value of
input X.
This is not the whole story…
109
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Suppose the X input is 0 and that the level on the bus is low because
another device is driving it low. In this case, the output of the open-collector
inverter will also be forced low by the bus.
Now, both inputs to the AND gate will be 0 and the Y output will be 1. That
is, the Y output is 0 unless the input X is 1 and the bus is being driven low.
We have a mechanism that can actively drive the bus low or detect when
another device is driving the bus low when we are attempting to drive it
high. This mechanism forms the basis of distributed arbitration.
110
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.47 shows how the distributed arbiter operates by considering all
possible input conditions together with the state of bus line. Remember
that the bus can be floating (not driven) or actively pulled down to a low
level. When it is floating, a resistor weakly pulls the bus up to a high level.
111
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figures 12.47(a) and (b) assume that the bus is floating. The output of the
circuit is always 0 and is independent of its input. In a real system, the bus
will always be actively pulled down to an electrically low level or weakly
pulled up to an electrically high level by a resistor.
.
112
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In figures 12.47(c) and (d) the bus is being actively driven to 0. In Figure
12.47(c) the bus is actively being driven low, but the state of the opencollector is also low, so there is no conflict between the output of the opencollector inverter and the bus. The output of the circuit is 0.
.
113
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In Figure 12.47(d) the input is 0 and the output of the open-collector gate is
floating. The is low and the output of the inverter is pulled down to an
electrically low state. The output of the circuit is 1. The output tells the
system that another device is driving the bus low in contradiction to the
input.
114
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In figures 12.47(e) and f the bus is in a high state because no other device
is driving it low. Figure 12.47(e) is the interesting case. Here the input is 1
and the output of the open-collector gate is electrically low. This drives the
bus to a low state. In this case the circuit is driving the bus. The output of
the circuit is 0. In Figure 12.47(e) the input is 0 and the output of the
inverter is floating so there is no conflict with the state of the bus.
.
115
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
As you can see, there are two special cases. In one, the bus is active-low
and the output of the inverter is high, which results in the inverter’s
output being pulled down.
In the other case, the bus is high and the output of the inverter is activelow, which results in the bus being forced low.
Table 12.4 summarizes the action of this circuit. The input to this circuit
represents the condition I want the bus or I don’t want the bus. If the bus is
not being driven low, this circuit will drive the bus low itself if its input is
1. This circuit produces a 0 output unless its input is 1 and the bus is being
actively driven low by some other device.
Situation
Bus condition
I want the bus Bus free (high
level)
I want the bus Bus busy (low
level)
I do not want Don’t care
the bus
Result
Output is 1. Get the bus and drive
it low
Output is 0. I do not get the bus
Output is 0
© 2014 Cengage Learning Engineering. All Rights Reserved.
116
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.48 illustrates the details of part of a NuBus. A potential master
that wants to use the bus places its arbitration level on the 4-bit arbitration
bus, ID3* to ID0*.
Since NuBus uses negative logic, the arbitration number is inverted so that
the highest level of priority is 0000 and the least is 1111.
NuBus arbitration is simple. If a competing master sees a higher level on
the bus than its own level, it ceases to compete for the bus. Each requester
simultaneously drives the arbitration bus and observes the signal on the
bus. If it detects the presence of a requester at a higher level, it backs off.
117
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
ID0* to ID3* define the slot location and priority level of the master, and
lines ARB0* to ARB3* are the arbitration lines running the length of the
bus.
Arbitrate * permits the master to arbitrate for the bus, and the output
GRANT is asserted if the master wins the arbitration.
118
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Suppose three masters numbered 0100 4, 5, and 6 put the codes 1011, 1010
and 1101, respectively, onto the arbitration bus. As the arbitration lines are
open-collector, any output at a 0 level will pull the bus down to 0
Here, the bus will be forced to 1000. The master at level 2 putting1101 on
the bus will detect that ARB2 is being pulled down and leave the arbitrating
process. The arbitration bus will now be 1010. The master with the code
1011 will detect that ARB1 is being pulled down and will leave the
arbitration process. The value on the arbitration bus is now 1010 and the
master with that value has gained control.
119
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
PCI Bus
The Peripheral Component Interconnect Local Bus (or just PCI bus)
represents a radical change to the PC’s systems architecture.
Intel designed this bus for use in Pentium-based systems towards the end
of 1993.
The PCI bus is not only much faster than previous buses; it greatly extends
the functionality of the PC architecture. Indeed, the PCI bus is central to
the PC's expandability and flexibility.
The PCI bus allows users to plug cards into the computer system to
increase functionality by adding modems, SCSI interfaces, video processors,
sound cards, and so on.
The PCI bus lets these cards communicate with the CPU via an interface
known as a North Bridge. Bus interface circuits have come to be known
collectively as a chipset. All PCs with PCI buses require such a chipset.
120
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The PCI is called a local bus to contrast it with the address, data, and
control signals from the CPU itself. Connecting systems directly to the
CPU provides the fastest data transfer rates and a bus connected directly
to a CPU is called a front side bus.
The PCI bus supports plug and play capabilities in which PCI plug-in
cards are automatically configured at power up and resources such as
interrupt requests are assigned to plug and play cards transparently to
the user.
The original PCI bus operated at 33 MHz and supported a 32-bit and 64bit data bus. PCI bus Version 2.1 supports a 66 MHz clock.
The PCI bus is connected to the PC system by means of a single-chip PCI
Bridge and to other buses via a second bridge. This arrangement means
that a PC with a PCI bus can still support the older ISA bus.
As time passes, fewer and fewer new PCs will have ISA buses because
new users will demand PCI cards as they are better than ISA cards.
121
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.49 illustrates the relationship between the PCI bus, the bridge,
processor, memory and peripherals.
The processor is directly connected to a bridge circuit that allows the
processor to access peripherals via the PCI bus.
The PCI system consists of the PCI local bus itself, any cards plugged into
the bus, and central resources that control the PCI bus.
These central resources perform, for example, arbitration between the cards
plugged into the bus.
122
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.50 shows a
system diagram of a PC
with a PCI local bus and an
ISA bus. A second bridge,
commonly called the South
Bridge, links the PCI and
ISA buses.
123
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.51(a) illustrates the relationship between the Pentium 4, its Intel
chipset, and the PCI bus. Figure 12.51(b) illustrates the more modern Intel
Core i7 Processor interface.
124
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
PCI bus arbitration
Figure 12.52 demonstrates PCI bus arbitration. The REQ and GNT signals
are connected to an arbiter that forms part of the north bridge. This arbiter
reads the requests on REQ0 to REQ3 and returns a grant message on the
GNT0 to GNT3 line corresponding to the arbitration winner. When a PCI
agent arbitrates for the bus, the arbiter asserts the BPRI signal to inform the
host processor that a PCI agent (i.e., a priority agent) requires the host bus.
125
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Data Transactions on the PCI Bus
The PCI bus compensates for the address/data bus bottleneck in several ways.
First, it can operate in a burst mode, in which a single address is transmitted
and then the address/data bus is used to transmit a sequence of consecutive
data values.
Second, the PCI bus supports split transactions; that is, one device can use
the bus and another device can access the PCI bus before the first transaction
has been completed. Split transactions mean that the bus is used more
efficiently.
Finally, devices connected to the PCI bus can be buffered which allows data to
be transmitted before it is needed.
PCI bus literature has its own terminology (some of which is shared by SCSI
systems). A device that acts as a bus master is called an initiator and a device
that responds to a bus master is called a target
126
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Some of the key signals of the PCI bus are given below.
Signal
AD31 – AD0
C/BE3* – C/BE0*
TRDY*
IRDY*
FRAME*
DEVSEL*
Function
Multiplexed address and data
Command/byte enable
Target ready
Initiator ready
Frame
Device select
Driven by
Initiator
Initiator
Target
Initiator
Initiator
Target
127
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.53 illustrates a PCI read cycle in which an initiator reads data
from a target on the PCI bus.
128
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.54 illustrates a PCI read cycle in which the address phase is
followed by three data phases.
129
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The PCI Express Bus
The PCI Express bus was designed to replace the PCI bus. Its goals were to
cost less than the existing PCI bus, use off-the-shelf technology (boards,
connectors, and circuits), support mobile, desktop and server markets, and be
compatible with existing PCI-based systems
The PCI express uses serial transmission to transfer data from point to point.
130
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.55 demonstrates the difference between the PCI bus and PCI
Express protocols.
The PCI bus protocol has echoes of the ISO standard for the Open Systems
Interconnection (OSI) model, that attempts to divide any communications
system into seven abstract layers, where each layer performs a certain
function for the layer above it.
131
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The lowest level of the PCI Express protocol is the physical layer responsible
for transferring the bits from point-to-point.
The PCI Express uses a serial bus where data is transmitted bit-by-bit along
a single line or along a pair of lines using differential encoding.
Two serial data paths are provided, one for each data direction; that is, a PCI
Express card can both read and write data to the bus simultaneously and
support full-duplex operation.
The two signal paths are collectively called a lane and it is possible to
implement multiple lanes.
Performance scales linearly with lane numbers and you can have a x1 bus, a
x2, bus, a x4 bus, a x8 bus, and so on.
A single lane supports a peak data rate of 250 MB/s in each direction. A 16lane system using duplex transmission has an effective data rate of 8 GB/s.
132
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.56 illustrates the concept of lanes.
133
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Conventionally, information at the electrical level in digital systems is
specified with respect to the ground or chassis; that is, a signal at greater
than 3.0V is interpreted as high, and a signal at less than 0.3 V is
interpreted as low.
PCI Express uses two signal paths to transmit data and the difference
between the two conductors contains the information; for example, the
signals may be +V,-V or -V, +V.
The advantage of differential transmission is that it is more immune to
interference (noise and other signals induced by capacitive or inductive
coupling).
This form of signaling is called LVDS – low voltage differential signaling.
If both conductors of a pair pick up interference it does not affect the
information, which is determined by the difference between the two
conductors.
134
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The encoding of the bit stream across the serial link ensures that a clock
signal is embedded in the data stream and the data stream can be used to
recover a clock signal
This means that designers do not have to worry about the distribution of
clock signals and delays between data and clocks caused by different path
lengths in the signals (an important factor when signaling at 2.5 x 1030
bits/s).
The bit encoding is called 8b/10b because each 8-bit byte is transmitted at 10
bits in order to equalize the number of 1s and 0s transmitted and to ensure
that a clock signal can be recovered from the data signal.
135
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
8b/10b Encoding
8b/10b encoding is a means of transmitting serial data using 10 bits to
carry 8 bits of information.
The additional two bits per byte improves the performance of the
transmission mechanism.
The ten-bit code is constrained to contain five 1s and five 0s, or four
1s and six 0s, or six 1s and four 0s. This ensures that there are no
long series of only 1s and 0s.
A mechanism called running disparity is used to ensure that there is
an equal number of 1s and 0s on average; this is necessary to ensure
that there is no dc component in the signal.
136
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
PCIe Data Link Layer
Data transmitted across systems that support layered protocols looks a bit
like a Russian doll with multiple layers of encapsulations.
At one end of a link, the application takes a dollop of data and wraps it up
with some form of ends or delimiters. Then, the application layer hands the
package to another layer (e.g., the data link layer) and that layer in turn
wraps up the data with its own terminators.
The data link layer passes the data to the physical layer and that too adds
beginning and end flags.
137
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.57 illustrates the concept of encapsulation using a system with
three protocol levels or layers.
Each protocol layer adds a header and a tail to the information passed from
the layer below.
Each layer strips the header and tail off before passing the message to the
next level.
138
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.58 illustrates the PCIe bus message structure where the elements
of a message are shown in blue and the protocol layers in grey. The highest
level is the transaction layer that consists of a header and the actual
message itself.
The header defines the nature of the data message and includes information
such as the address of the data – we will look at the header in more detail
later. The transaction layer’s tail is an error-detecting code, ECRC
139
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.59 gives the general structure of a packet header that consists of
12 or 16 bytes.
This structure means that all the hardware overhead associated with
conventional buses becomes redundant (arbitration, interrupt, handshaking
etc.) at the price of increased latency and reduced efficiency due to the data
overhead.
140
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The SCSI and SAS Interfaces
One of the earliest external buses designed to link a computer and
peripherals is the SCSI bus. At one time it was the preferred bus in
professional and high-end systems.
Today, it is in decline in the face of very low-cost high-performance buses
such as USB and FireWire.
The Small Computer System Interface, SCSI, is an 8-bit parallel bus dating
back to 1979, when the disk manufacturer Shugart was looking for a
universal interface for its family of hard disks.
The SCSI bus is a parallel data bus that incorporates an information
exchange protocol optimized for the bus’s intended use, the linking of disk
drives and other storage systems to a host computer
141
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.61 illustrates the concept of the SCSI bus which was originally
called the SASI bus (Shugart Associates Systems Interface).
In 1981 Shugart and NCR worked with ANSI to standardize the SCSI bus
which became X3.131-1986 in 1986.
142
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The original SCSI-1 bus operated at 5 MHz permitting up to seven
peripherals to be connected together.
A family of SCSI buses with a common architecture and different levels of
performance has been developed.
The specification was revised in 1991 providing a fast SCSI-2 bus at 10
MHz and a wide bus with a 16- data path.
Ultra SCSI or SCSI 3 was the next step with a clock rate of 20 MHz.
All SCSI systems support asynchronous data transfers, but SCSI 2 also
supports faster synchronous data transfers.
USB 3.0 bus provides a theoretical limit of 4.8 Gbps or 600 MB/s.
143
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The original SCSI-1 bus operated at 5 MHz permitting up to seven
peripherals to be connected together.
A family of SCSI buses with a common architecture and different levels of
performance has been developed.
The specification was revised in 1991 providing a fast SCSI-2 bus at 10
MHz and a wide bus with a 16- data path.
Ultra SCSI or SCSI 3 was the next step with a clock rate of 20 MHz.
All SCSI systems support asynchronous data transfers, but SCSI 2 also
supports faster synchronous data transfers.
USB 3.0 bus provides a theoretical limit of 4.8 Gbps or 600 MB/s.
144
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Version
Width
Data rate MHz
SCSI-1
Fast SCSI
Fast Wide SCSI
Ultra SCSI
Wide Ultra SCSI
Ultra-2 SCSI
Wide Ultra-2 SCSI
8
8
16
8
16
8
16
5
10
10
20
20
40
40
Throughput
MB/s
5
10
20
20
40
40
80
Ultra-3 SCSI
Ultra 320 SCSI
Ultra 640
16
16
16
80
160
320
160
320
640
145
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.62 provides a simplified state diagram for the SCSI bus and
Table 12.9 describes the bus states.
146
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
State
Bus free
Description
No devices are controlling the bus or transferring data. This state is
indicated by the negation of the SEL and BSY lines.
Arbitration When a device wishes to take control of the bus and become an
initiator, it enters the arbitration state by asserting BSY and putting
its ID on the data bus. If no device with a higher priority claims the
bus, it goes ahead and claims the bus by asserting SEL.
Selection
In the selection state the initiator selects a target device and issues
commands asking it carry out a specific operation. Selection is done by
issuing the logical OR of the target device and the initiator on the bus.
Reselection Because a target is able to give up the bus during a long operation, a
reselection state is needed during which the target reclaims the bus.
Command The command phase is used by the target device to request a command
for the initiator. The target sets the C/D signal low to indicate a
command and sets I/O high to indicate an output operation.
Data
In the data phase, data is transmitted between the initiator and target.
Message
Status
The interface is controlled by messages sent between the initiator and
target.
The target returns a code to the initiator, indicating an operation’s
status.
147
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Serial Attached SCSI (SAS)
Serial attached SCSI, SAS, retains the best of SCSI while moving closer to USB and
PCIe. Serial attached SCSI throws away SCSI’s antiquated physical layer and
replaces it with a low-cost, high-performance serial interface.
The topology of SAS is point-to-point unlike SCSI which is a multipoint bus.
The physical layer of SAS uses differential signaling and cables up to 10 m are
supported. SAS defines two low-level layers, physical and PHY that divide the
traditional physical-layer-level functions (plus some traditional link-layer level
functions) into two layers. The physical layer level is concerned only with connectors
and voltage levels, where the PHY layer is concerned with data encoding, link
initialization, speed negotiation, and reset sequences. SAS uses 8b/10b encoding.
Cables and connectors are physically compatible with the SATA interface now used
by all modern hard disk drives. Consequently, the same low-cost connectors can be
used in both conventional PCs and SAS-based systems. SAS supports SATA
Tunneling Protocol (STP) that enables conventional hard drives to be connected to
SAS architectures.
148
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Serial Interface Buses
Once upon a time, a serial bus swapped speed for simplicity.
Parallel buses require multiple data paths and correspondingly complex
plug-socket arrangements and cables whereas a serial bus carries data a
bit at a time using two signal paths, one for the data and a ground return.
A serial data link using a fiber optic cable requires a single data path.
In the 1970s when RS232C serial connections were slow, you had to use a
parallel data bus if you wanted speed.
149
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Early PCs had a parallel port with eight data lines using a DB-25
connector that you could use to interface to printers using 25-way ribbon
cable.
Some printers still have such basic parallel interfaces, although they are
becoming increasingly obsolete.
Modern serial data links are simple, fast, and effective.
The great advantage of the serial data bus is its ease of use.
The buses we describe next have had a tremendous impact on computing
because they provide low-cost, high-performance solutions to linking
computers with peripherals and even with other computers.
We begin with an introduction to the serial bus that was to have a
profound effect on computer communications, the Ethernet.
150
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Ethernet
We introduce serial buses by briefly describing the Ethernet, developed to
support local area networks at 10 Mbits/s. The Ethernet dates back to
1978 and now has the IEEE standard number 802.3.
Today, it is the standard for low-cost local area networks operating at
100 Mbits/s or 1 Gbits/s.
In an Ethernet, all devices are connected to a single cable and no special
control lines are required.
A device, or node, can transmit serial data onto the common bus that is
connected to all other devices.
The Ethernet transmission cable is now available in four versions: a thick
coaxial cable, a thin coaxial cable, a very low-cost twisted pair of
unshielded conductors and fiber optics
151
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The table below defines the nomenclature for these connections. The 10 refers
to the speed of the link, the Base refers to baseband transmission (as opposed
to modulated carrier systems), and the 5/2/T/F refer to the media type.
The nomenclature 1000BaseF refers to Ethernet media operating at 1 Gbps
using fiber optics.
Name
Media Type
Max. Segment Length
10Base5
10Base2
10BaseT
10BaseF
Thick coaxial
RG58 (thin coaxial)
UTP (twisted pair)
Fiber optic
500 meters
185 meters
100 meters
2,000 meters
Max
nodes/segment
100
30
1024
1024
152
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The data is transmitted in the form of packets or frames. An Ethernet packet
consists of seven fields starting with an 8-byte (64-bit) preamble that
synchronizes the clock at the receiver with the transmitted bit stream.
The first seven bytes have the bit pattern 10101010. The last byte of the
preamble is the start of frame delimiter with the special pattern 10101011 that
indicates the start of a frame.
The 48-bit destination and source address fields indicate where the packet is
from and where it is going. The length field defines the size of the packet,
which is between 46 and 1500 bytes long.
However, since the minimum data field must contain at least 46 bytes, data
fields shorter than 46 bytes are padded to bring the size up to 46 bytes.
Finally, a 32-bit frame check sequence provides a powerful error-detecting
code.
153
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Ethernet is simply a message exchange mechanism (i.e., there are no control
signals). One node on the Ethernet posts a frame to another node. How the
data field in a package is interpreted is not part of the Ethernet specification.
The physical layer of the Ethernet uses a baseband cable with phase encoded
data transmitted at 100 Mb/s or 1 Gb/s second.
No two nodes can access the Ethernet simultaneously without their messages
interfering destructively with each other. When two messages overlap, a
collision occurs and both messages are lost. Any node wishing to communicate
with another node goes ahead and transmits its message. If another node is
transmitting at the same time, or joins in before the message is finished, the
message is lost.
If the sender does not receive an acknowledgment within a time-out period, it
assumes that its message has been corrupted in transmission.
154
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Without any control over when a node may transmit, there’s nothing to stop
two or more nodes transmitting simultaneously.
The simplest form of contention control would be to let the transmitters
retransmit their messages.
Such a scheme cannot work, as the competing nodes would keep
retransmitting the messages which would keep getting scrambled.
A better strategy on detecting a collision is to back-off or wait a random time
before trying to retransmit the frame.
155
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
FireWire 1394 Serial Bus
The FireWire bus (IEEE 1394 bus) is an example of a bus that began life as
a company project and later became an industry standard.
Apple developed FireWire in 1986 as a replacement for the then parallel
SCSI bus, for use in the professional audio and video world.
FireWire is an Apple trademark – the same bus is called iLink by Sony.
Several factors have led to the design of the FireWire bus. The first is cost.
Over the past four decades microcomputers have been embedded in domestic
products from TVs to washing machines. This is particularly true in the
audio-visual area. If such devices are to be interconnected, a link must be
cheap—the consumer doesn't expect to pay a fortune for an add-on.
The second factor is size. Electronic devices are getting smaller and smaller.
When computers were large, the space taken by a bulky connector on the
back was of little importance. Small devices require correspondingly small
connectors (e.g., a hand-held video camera, a computer games console, MP3
player, or cell phone).
156
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
FireWire 1394 Serial Bus
The third factor is speed.
Year by year the rate at which computers are clocked and the speed at which
data is transferred between digital devices has increased relentlessly.
In 1975 a 600 baud modem was regarded as fast—by 1995 the 28.8K bps
modem was routinely used to interface computers to the Internet, and by
2000 the 512 Kb/s cable modem could be found in many households.
The fourth factor is reliability. Systems often fail due to faulty connectors,
largely caused by repeated insertion and removal of plugs or the continuous
flexing of cables. A reliable connector should have as few signal paths as
possible.
157
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
In the early 1990s a serial bus, initially called the P1394 High Performance
Serial Bus was proposed (the “P” indicates a provisional standard).
A serial bus has many advantages over a parallel bus like the SCSI bus.
In particular, a serial bus has only two conductors (one if a fiber optic path
is used) which reduces the cost of cabling and the cost and size of
connectors.
The P1394 bus was designed to take advantage of the best available
technology and can support several different physical layers; that is, the
P1394 bus is not tied to one type of physical implementation.
Some the most important features of the serial bus are:
158
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
• Automatic assignment of node (i.e., the device connected to the bus)
addresses—there is no need for address switches or other means of
assigning addresses to nodes
• Variable-speed data transmission. The IEEE 1394b specification doubled
the number of bits per packet to increase the bus rate to 800 Mb/s
• The cable medium allows up to 16 physical connections or cable hops,
each of up to 4.5 meters
• A fair bus access mechanism that guarantees all nodes equal access
• Consistent with IEEE Std 1212–1991, IEEE Standard Control and Status
Register (CSR) Architecture for Microcomputer Buses (ANSI)
• The 1394 Serial Bus limits the number of nodes on any bus to 63.
However, up to 216 nodes are supported by means of multiple buses
linked via bus bridges.
159
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.64 describes the 1394 Serial Bus’s layered protocol which is
essentially the same as that of the PCI Express bus.
Each layer provides a specific service. Because each layer communicates
with the layer above or below it in a tightly specified manner, it is possible
to replace any layer by a system that performs the same function.
That is, the 1394 Serial Bus is technology independent.
160
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Serial Bus Topology
Data transmission systems are characterized by their topology that
describes the way in which the individual nodes are related. The Ethernet
is a bus because all nodes are connected to it and information is sent from
one node to all other nodes on the bus—there is no routing mechanism to
determine how information propagates on the bus. Another system is the
ring, in which all nodes are connected to each other and information flows
from node to node.
161
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The general structure of the 1394 Serial Bus is described by Figure 12.65.
162
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
IEEE 1394 Hardware
The 1394 bus uses a six-element cable. There are two twisted pairs of
data conductors to provide two independent data channels as well as two
power supply lines. A seventh conductor surrounds the inner six conductors
to shield the entire cable electrically.
163
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.66 illustrates a simple tree structure with five nodes. Each node is
either a branch, directly connected to more than one neighbor, or it is a leaf
with only a single neighbor. Many applications of the serial bus daisy-chain
the nodes together, a special case of a tree structure.
164
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Physical Layer
The serial bus transmits information in the form of packets in a half-duplex
mode. A data strobe, STRB, controls the data flow.
Data is transmitted in a NRZ (non-return to zero) format and STRB changes
state whenever two consecutive NRZ bits are the same value. This mechanism
makes it easy to derive a clock from the data and strobe signals (by exclusiveORing them).
Figure 12.67 provides an example of the encoding for the sequence 10110001.
165
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Arbitration
The Serial Bus implements several forms of arbitration. Here we describe fair
arbitration that occurs on the cable bus. Arbitration is geographic because the
node closest to the root on a cable will always win.
The fair arbitration protocol is based on the concept of a fairness interval that
consists of one or more periods of bus activity separated by short idle periods
called subaction gaps followed by a longer idle period known as an arbitration
reset gap. At the end of each subaction gap, bus arbitration determines the next
node to transmit an asynchronous packet
166
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
When using fair arbitration, an active node can initiate sending an
asynchronous packet exactly once in each fairness interval. An active node can
arbitrate only if its arb_enable signal is set.
The arb_enable signal is set to one by an arbitration reset gap and is cleared
when the node wins the arbitration. This disables further arbitration requests
for the remainder of the fairness interval. A fairness interval ends when
arbitration by the final fair node is successful; this generates an arb_reset_gap
since all nodes now have their arb_enable signals reset and cannot drive the
bus. The arb_reset_gap re-enables arbitration on all cards and starts the next
fairness interval.
167
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
USB
Two factors contributing most to the success of the personal digital revolution
that encompasses desktop computers, laptops and notebooks, MP3 players
and both still and digital cameras as well as cellular phones, are flash
memory and the universal serial bus, USB.
Flash memory provides robust, high-density, low size, non-volatile storage at
low prices, and USB technology allows you to connect almost any modern
digital device to a computer – or even to connect two digital devices together
without a host computer (e.g., a camera and a printer).
Indeed, the USB is the single most successful digital interface ever, with over
one billion USB devices sold by 2009.
168
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
USB History
USB was developed by a consortium of Compaq, DEC, IBM,
Microsoft, NEC, and Nortel in 1994. The first USB specification, 1.0,
was introduced in 1996 and supported a data rate of 12 Mb/s. USB
1.1 was released in 1998 to deal with problems related to hubs. USB
1.1 was widely adopted.
In 2000 USB 2.0 emerged to provide a maximum data rate of 480
Mb/s. USB had jumped into FireWire territory and it became the de
facto standard for most PC interfaces to printers, external drives,
keyboards, mice, and so on.
It wasn’t until 2009 that version 3.0 USB saw the light of day with an
operating speed of 300 MB/s (i.e., 2,400 Mb/s) displacing FireWire
from its eight-end niche. USB 3.0 is a giant leap forward over version
2.0 and requires a new cable format and technology.
169
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The Universal Serial Bus was devised a consortium of companies and
later established as a standard interface.
A USB bus uses low cost connectors and cabling to connect a computer to
a range of peripherals from the mouse/keyboard/printer/scanner to
memory devices such as external hard-drives and flash memory devices
(so called pen drives).
USB is an alternative to FireWire.
170
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
USB is host controlled. Unlike other buses such as PCI, it does not support a
multimaster arrangement. There can be only one USB master (or host) per bus.
Figure 12.70 describes the tiered star topology of a USB system.
171
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
At the top of the hierarchy sits the host which communicates with the
computer and controls the USB bus. The host is connected to a hub which is a
device that distributes the USB bus to lower levels in the hierarchy. A hub may
be connected directly to a peripheral, or to several peripherals, or to another
hub. Each hub may be connected to a lower-level hub.
172
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
USB uses a simple multicore connector with dedicated connectors. However,
since the introduction of USB, peripherals have become smaller and smaller
and USB cables have been forced to follow this trend with the result that
there are now four basic sizes.
Figure 12.71 illustrates both USB plugs and sockets. The computer end of the
link uses type A sockets and type B plugs which are fairly substantial. Type B
plugs and sockets are used at the other end of the link (i.e., at a hub or a
peripheral such as a printer).
Mini-B plugs and sockets were developed for digital cameras, cell phones, and
portable disk drives
173
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
174
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.73 illustrates the structure of a USB system.
175
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
USB cables use four conductors. Data is transmitted differentially
between a twisted pair of wires labeled D+ and D- in Figure 12.74. Recall
that differential mode transmission increases reliability be rejecting
common mode interference. The twisted pair is enclosed in a metal shield
to further reduce the dangers of picking up stray signals. The specified
maximum length of the cable is 5 meters. Of course, you can use a hub at
the end of a 5m cable to increase the length of the USB path.
176
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
More Power
The USB’s ability to deliver power was intended to support hubs and
peripherals like the mouse and keyboard.
In practice, many manufacturers have taken advantage of this facility;
for example, the USB’s built in power supply has been used to
charge cell phones and MP3 players.
A new power mode was added to the USB specification called battery
charging. A host can supply up to 1.5A when communicating at 12
Mbps, or 0.9A when communicating at 480 Mbps.
Furthermore, by 2010, many of the world’s cell phone manufacturers
had provided micro USB ports to charge their phones.
177
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Physical Layer Data Transmission
At the electrical level the USB employs NRZI data encoding where a
logical 1 is represented by no change of level and a 0 is represented by a
change of level. Sending a string of 0s requires the greatest bandwidth
and transmitting a sequence of 1s results in a constant level with no
signal transitions.
Figure 12.75 illustrates a NRZI sequence (this encoding format is very
old in comparison with the 8b10b encoding used by the PCIe bus and is
relatively little used because it is non-self-clocking and has a dc bias).
The two signal levels on a USB bus are referred to as J and K.
178
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Logical Layer
Logically, a communications channel exists between the host and device, and
in USB-speak this channel is called a pipe. The USB host can support 32
active pipes at any instant (16 up-stream and 16-down-stream pipes).
The pipe is terminated at a peripheral by an endpoint. Although the physical
structure of a USB system is a tiered star, it is logically a star network. All
nodes have to communicate with each other via the single central node.
179
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
On-the-Go USB
The USB 2.0 doesn’t support peer-to-peer networking; it covers only host to peripheral
communications – the host is usually a computer
The On-The-Go USB mode is a peer-to-peer (point-to-point) protocol which allows two
USB devices to communicate without a host computer and provides direct device-todevice communications. On-the-go, ONG, operation was adopted in 2001.
This extension to USB provided for communications between several OTG devices or
between an OTG device and a conventional USB host. This revision also introduced a
new range of USB plugs and sockets called Micro-A and Micro-B. The revised
standard introduces the dual-role device (i.e., OTG) that can act as either a host or a
peripheral. Moreover, a dual-role device must be able to supply a limited current to the
bus of at least 8 mA (remember that most ORG devices are battery-powered).
OTG devices, when acting as a host, may support only a targeted peripheral list; that
is, the OTG device may operate only in conjunction with certain specified peripherals –
it is not intended to be a general-purpose host with full USB capabilities. A significant
element of ONG technology is the Host Negotiation Protocol, HNP, that allows the
transfer of control between two ONG devices.
180
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
USB 3.0
The most radical change to the universal bus took place in 2010 with the
introduction of USB 3.0 which provides a tenfold increase in performance
and uses less power.
Even more remarkably, USB 3.0 is physically compatible with USB 2. USB
3.0 is interesting because it is not really a development or extension of USB
2.0, but a replacement bus that coexists with USB 2.0; that is, a USB 3.0 bus
incorporates a USB 2.0 bus as well.
181
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.77 illustrates the structure of the USB 3.0 cable. The two data
carrying conductors of USB 2.0 are maintained as well as the two power
conductors.
Two new differential pairs of conductors (SSRX and SSTX) have been added
carry the new USB 3.0 data in a full duplex bidirectional mode.
182
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
The additional functionality of USB 3.0 is called the SuperSpeed bus which
provides a maximum speed of 4.8 Gb/s.
To put this into context, it takes USB 2.0 13.9 minutes to transfer an HD
movie, whereas USB 3.0 can perform the transfer in only 70s.
In short, USB 3.0 is an impressive feat of engineering that take a great leap
ahead in terms of functionality and performance, while maintaining
backward compatibility with a vast existing market of USB 2.0 users.
183
© 2014 Cengage Learning Engineering. All Rights Reserved.
Computer Organization and Architecture: Themes and Variations, 1st Edition
Clements
Figure 12.78 illustrates the logical structure of the USB 3.0 bus in terms of
protocol layers (like the ISO 7 layer model for open systems interconnection).
The SuperSpeed bus that adds all the new functionality to USB 3.0 is similar to
the PCIexpress bus and uses 8b/10b encoding at the physical layer level. Data
is scrambled on SuperBus. This is not a security mechanism but a means or
converting data into a sequence that appears random in order to improve the
electrical properties of the data link.
184
© 2014 Cengage Learning Engineering. All Rights Reserved.