Module 3: Central Processing Unit and Memory Design

advertisement
Module 3: Central Processing Unit and Memory Design
Commentary
Topics
I.
II.
III.
The Basic Little Man Computer
Organization of the CPU and Memory
Instruction Set Architecture
I. The Basic Little Man Computer
A. Basic Computer Organization
The modern-day computer is based on an architecture defined in 1951 by John von Neumann at
the Institute for Advanced Study in Princeton, New Jersey. Von Neumann's design was based on
three concepts:
memory contains both programs and data—the stored-program concept
the contents of memory are addressable by location, without regard to the type of data
contained therein
execution of instructions occurs in a sequential fashion unless that order is explicitly
modified
The hardware of the computer is usually divided into three major components:
1. The central processing unit (CPU) contains an arithmetic and logic unit for
manipulating data, a number of registers for storing data, and control circuits for fetching
and executing instructions.
2. The memory subsystem of a computer contains storage for instructions and data. The
CPU can access any location in memory at random and read and write data within a fixed
interval of time.
3. The input/output (I/O) subsystem contains electronic circuits for communicating and
controlling the transfer of information between the computer and the outside world. The
I/O devices connected to the computer may include keyboards, printers, terminals,
magnetic disk drives, and communication devices.
These major components are interconnected by a set of wires, called a bus, which carries
information relating to addresses in memory, data, and control signals. We will discuss each of
the major components in section II of this module.
The instruction-set architecture, which we will discuss in section III of this module, describes the
instructions the CPU can process. As part of the instruction processing, the CPU uses a fetchexecute cycle that it executes repetitively until it encounters and executes a "halt" instruction.
This cycle consists of the following steps:
1. Fetch the next instruction from memory.
2. Decode the instruction.
3. Resolve memory addresses and determine the location of any operands.
4. Execute the instruction.
5. Go back to step 1.
We will use this cycle to help explain the operation of the Little Man Computer in the next
section.
B. Little Man Computer Layout
A model often used to help explain the operation of a computer is the Little Man Computer
(LMC), developed by Dr. Stuart Madnick at the Massachusetts Institute of Technology (MIT) in
1965, and revised in 1979. The LMC model looks at the operations of the computer from a
decimal viewpoint, using easy-to-understand analogies. We will use the LMC at several key
points throughout this module.
The general model of the LMC is that of a little circuit-board man inside a walled mailroom. See
the figure below for the LMC model that we will use in this course. It has:
an In Basket
an Out Basket (the in and out baskets are the Little Man's only means of communication
with the outside world)
a simple Calculator with a display consisting of three decimal digits and a sign (+ or –).
The display can show any number in the range of –999 to +999.
an instruction location counter, or Location Counter
an Instruction Display for the instruction being executed
one hundred mailboxes, each uniquely identified by a two-digit address in the range of 00
to 99. Each mailbox can contain a three-digit number, that is, a number in the range of
000 to 999. There is no provision for storing a sign (+ or –) in a mailbox, so (unlike the
calculator) a mailbox cannot contain a negative number.
a comment box that explains what is happening during the current step
navigation buttons to take the user at the appropriate time to the fetch and execute
steps and to demonstrate these steps (these buttons are not shown in the model in figure
3.1.)
Figure 3.1 shows the components of the Little Man Computer.
Figure 3.1
Little Man Computer
Each mailbox can store three digits, which can represent either data or an instruction. When they
represent an instruction, the three digits are broken up into two parts:
the first digit tells the LMC what to do—it is called an operation code, or op code
the next two digits are used to indicate the mailbox that is to be addressed (when
required)
C. LMC Operation
To show how the LMC works, let's look at a simple program for subtraction, using the basic
instruction set in table 3.1 below, and the program in the step diagram of figure 3.2. In this
table, "XX" in the Address column indicates a memory address.
Table 3.1
LMC Instructions
Instruction
Mnemonic
Op
Address What the Little Man Does
Code
Coffee Break
(Halt)
COB or
HLT
0
00
Self-explanatory. Do nothing.
Add
ADD
1
XX
Add the contents of mailbox XX to the calculator.
Add 1 to the counter.
Subtract
SUB
2
XX
Subtract the contents of mailbox XX from the
calculator. Add 1 to the counter.
Store
STO
3
XX
Copy the contents of the calculator into mailbox
XX. The sign indication is discarded. Add 1 to the
counter.
Load
LDA
5
XX
Copy the contents of mailbox XX into the
calculator. The sign in the calculator is set to +.
Add 1 to the counter.
Input
IN
9
01
Move the contents of the inbox to the calculator.
The sign in the calculator is set to +. Add 1 to the
counter. (If there is nothing in the inbox, just wait
until an input is provided. Moving the inbox
contents exposes the next input if there is one, or
leaves the inbox empty.)
Output
OUT
9
02
Copy the contents of the calculator to the outbox.
The sign indication is discarded. Add 1 to the
counter.
Branch
Unconditional
BR
6
XX
Copy XX into the counter.
Branch on
Zero
BRZ
7
XX
If the calculator contains zero, copy XX into the
counter; otherwise, add 1 to the counter. (The
sign in the calculator is ignored.)
Branch on
Positive
BRP
8
XX
If the value in the calculator is positive, copy XX
into the counter; otherwise, add 1 to the counter.
(Note that zero is considered to be a positive
value.)
Now let's look at a simple program for subtraction using the LMC. The interactive, animated
diagram below shows how the LMC calculates the positive difference between 123 and 456.
Our diagram has:
Fetch and Execute buttons to step through the instruction cycle
Show Me buttons that demonstrate each fetch and execute step
comments explaining what is happening during the current step
You can repeat the animation at any step by clicking on the Back button, then clicking again on
the Show Me button. The Next button bypasses the animation and goes directly to the next step.
Figure 3.2
Finding the Positive Difference between 123 and 456
Note that program instructions 06 and 07 were skipped because the condition of the branching
instruction was met. If, for example, the numbers had been 123 and 012, then the branching
requirement in step 08 would not have been met, and the program would have continued
through program instructions 06 and 07, where the order of substitution would have been
reversed and a positive answer would have been obtained.
II. Organization of the CPU and Memory
A. CPU Organization
The CPU is made up of three major parts:
1. The registers store intermediate data used during the execution of the instructions.
2. The arithmetic logic unit (ALU) performs the required operations for executing the
instructions.
3. The control unit supervises the transfer of information among the registers and instructs
the ALU as to which operation to perform.
Registers
In module 2, we saw that registers are composed of flip-flops, one for each bit in the register.
The CPU uses registers as a place to temporarily store data, instructions, and other information.
Different CPU designs have different numbers and types of registers. The following four types of
registers, however, are found in all designs:
PC, the program counter. The PC tells the CPU which memory location contains the next
instruction to be executed. Typically, the PC is incremented by 1 so that it points to the
next instruction in memory. But branch instructions can set the PC to another value.
In the LMC, the "program counter" is represented as the Location Counter.
IR, the instruction register. This register is important because it holds the instruction that
is currently being executed.
In the LMC, the "instruction register" is shown as the Instruction Display. To visualize the
LMC operation, we can picture the Little Man holding the instruction written on a slip of
paper he has taken from a mailbox, and reading what needs to be done.
MAR, the memory address register. This register tells the CPU which location in memory
is to be accessed.
The MAR is not shown in the LMC.
MDR, the memory data register. This register holds the data being put into or taken out
of the memory location identified by the MAR. It is sometimes called the memory buffer
register, or MBR.
The MDR is not shown in the LMC. To visualize the LMC operation, we can picture the
Little Man holding the data written on a slip of paper that is either being put into or taken
out of a mailbox.
Other registers that may be included in a particular CPU design are:
accumulators or general-purpose registers, which process data in the CPU. Normally,
multiple accumulators are in the CPU, and the number of accumulators is usually a power
of 2 (i.e., 2, 4, 8, or 16) in order to optimize the addressing scheme.
In the LMC, the accumulator is represented as the Calculator.
the program status register. This register stores a series of one-bit pieces of
information called flags that keep track of special conditions.
The LMC does not have a program status register.
IOR, the input and output interface registers, which are used to pass information to and
from the peripheral devices.
In the LMC, the input and output interface registers are represented as the In Basket and
the Out Basket.
the SP, the stack pointer, which points to the top of the stack. We will discuss the
operation of a stack in section III of this module.
The LMC does not have a stack pointer.
Here are some of the operations that can be performed on registers:
reading: When data are read from a register, the data are copied to a new destination
without changing the value in the register.
writing: When data are written into a register, the new data overwrite and destroy the
old data.
arithmetic operations: When data are added or subtracted, the new results are stored
in the register, thus destroying the old data.
shifts and rotations: Registers can be shifted or rotated to the right or to the left. We
will discuss these operations in section III of this module.
Arithmetic Logic Unit
We discussed the operation of a simple one-bit, four-operation ALU in module 2. A complete ALU
handles all of the bits in a word and can perform a wide variety of operations. Signals from the
control unit are used to select the operation to be performed.
Control Unit and the Fetch-Execute Instruction Cycle
The control unit controls the execution (including the order of execution) of instructions through
the use of the fetch-execute instruction cycle. The instruction to be executed is determined by
reading the contents of the program counter (PC). Execution of each instruction includes setting
a new value into the PC, thereby determining which instruction will be executed next. When an
instruction other than a branching instruction is executed, the PC is incremented by 1. When a
branching instruction is executed, the new value of the PC depends on the result of the test (if
any) and the address field of the instruction.
A diagram of the fetch-execute cycle is shown in figure 3.3 below. In the fetch phase, the
instruction is decoded using a decoder similar to the one shown in figure 2.7 (module 2, section I
D). The inputs A2, A1, and A0 in figure 2.7 are controlled by the control unit.
The output of the decoder is a "1" on one of the op code lines. That line causes the selected type
of instruction to execute.
Figure 3.3
Fetch-Execute Cycle
We can use the LMC to demonstrate how the fetch-execute instruction cycle works. The LMC
instructions listed in section I are composed of a series of data transfers and operations. To
discuss these data manipulations, we use the following notations:
A → B indicates that the contents of the A register are copied to the B register.
IR[address] → MAR indicates that the address portion of the IR register is copied into
the MAR.
A + B → A indicates that the contents of A are replaced by the sum of A and B.
In order for a program to run on the LMC, each instruction must first be fetched and then
executed. Thus, the instruction cycle is divided into two phases:
1. Fetch

The Little Man looks at the program counter (labeled Location Counter) to find
where to go in the memory for the next instruction.
 Then, the Little Man goes to that location to fetch the instruction to be executed.
2. Execute
 The Little Man executes the fetched instruction.
 Based on the instruction being executed, the Little Man puts the location of the
next instruction into the program counter.
The fetch cycle, which we saw in figure 3.2 in section I, is achieved with the following registertransfer steps:
1. PC → MAR
The value of the program counter is loaded in the MAR.
Remember, the program counter holds the location of the next
instruction to be executed.
2. Mem[MAR] → MDR
The contents of memory location MAR are transferred to the
MDR.
3. MDR → IR
The value in the MDR is copied into the IR.
4. IR[op code] → decoder
The op code portion of the instruction that is in the IR is
transferred to the decoder. The decoder selects the instruction to
be executed.
Once the instruction is fetched, the execution phase can start. The execution phase of the STORE
instruction is given below:
1. IR[address] → MAR
The address portion of the IR register identifies the address in
memory where data will be stored.
2. A → MDR
The data in A is put into the MDR.
3. MDR → Mem[MAR]
The data in the MDR is written into memory.
4. PC + 1 → PC
The program counter is incremented to point to the next
instruction.
Every instruction will start with the same fetch cycle. The complete cycle of register transfers for
two additional instructions, ADD and BR, are shown in the tables below:
Table 3.2
Register Transfers for ADD
Fetch
Execute
Step 1 PC → MAR
Step 5 IR[address] → MAR
Step 2 Mem[MAR] → MDR
Step 6 Mem[MAR] → MDR
Step 3 MDR → IR
Step 7 A + MDR → A
Step 4 IR[op code] → decoder Step 8 PC + 1 → PC
Table 3.3
Register Transfers for a BR Instruction
Fetch
Step 1 PC → MAR
Execute
Step 5 IR[address] → PC
Step 2 Mem[MAR] → MDR
Step 3 MDR → IR
Step 4 IR[op code] → decoder
B. Memory Organization
The memory of a computer consists of a large number of 1-bit storage devices organized into
words. The number of bits in a word varies with the computer design. In modern computers, the
memory word size is a power of two (8, 16, 32, or 64), but earlier computer designs contained
memories with words of 12, 18, 24, 36, 48, or 60 bits.
Thus, the structure of a memory can be described by specifying the number of bits in a word,
and the number of words. Figure 3.4 shows the organization of a 16 x 32 memory. The word
length is 16, and the number of words is 32. For each bit position in the word, a data line
connects the memory to the Memory Data Register in the CPU. The data line for a bit position
carries data to or from one of the bits in the corresponding bit position. To specify a particular
word in the memory, the address of the word is given in binary.
In the example shown in the figure, the words are numbered 0 to 31, which is 00000 to 11111 in
binary. Five lines, called address lines, carry the address from the CPU's Memory Address
Register to the memory.
Figure 3.4
Organization of a 32 Word x 16 Bits/Word Memory
Each word of this memory (shown by a row) contains 16 bits (two bytes). There are 16 data
lines connecting the memory to the CPU. The outputs of the address decoder determine which of
the 32 words is connected to the address lines. The input to the address decoder has 5 address
lines because 25 = 32.
In general:
If a memory contains words that are n bits long, then n data lines are required.
If a memory contains 2k words, then k address lines are required.
The following terminology is commonly used to refer to sizes of computer memories:
kilobyte (K) = 210 = 1,024 bytes
megabyte (M) = 220 = 1,048,576 bytes
gigabyte (G) = 230 = 1,073,741,824 bytes
We can use our understanding of decoders and addressing to explain how the MAR and MDR
work together to access a specific address in memory.
Little Endian, Big Endian
There is a convention that the bits in a word of any size are numbered from right to left; that is,
bit 0 is the least significant bit. So a byte looks like this:
bit bit bit bit bit bit bit bit
7 6 5 4 3 2 1 0
When data consist of more than one byte, it is necessary for the CPU to determine the order in
which the data will be stored in memory and to ensure the relative order of the bytes so that
information transmitted from one computer will be reassembled in the correct order by the
receiving computer. The two possibilities are:
big endian, where byte 0 of multi-byte data is on the left (or MSB) side of the word (see
figure 3.5 for storage of two 4-byte words)
little endian, where byte 0 of multi-byte data is on the right (or LSB) side of the word
(see figure 3.6 for storage of two 4-byte words)
Figure 3.5
Big Endian Memory
bit 31 (MSB)
↓
(LSB) bit 0 bit 31 (MSB)
↓↓
Word 0
byte
0
(LSB) bit 0
↓
Word 1
byte
1
byte
2
byte
3
byte
4
byte
5
byte
6
byte
7
In this view, the most significant bit of the 32-bit word 0 is bit 7 of byte 0, while the least
significant bit of the 32-bit word 0 is bit 0 of byte 3.
Figure 3.6
Little Endian Memory
bit 31 (MSB)
↓
(LSB) bit 0 bit 31 (MSB)
↓↓
(LSB) bit 0
↓
Word 1
byte
7
byte
6
byte
5
byte
4
Word 0
byte
3
byte
2
byte
1
byte
0
In this view, the most significant bit of the 32-bit word 0 is bit 7 of byte 3, while the least
significant bit of the 32-bit word 0 is bit 0 of byte 0.
As an example of big endian versus little endian storage, consider the 32-bit hexadecimal word
89ABCDEF stored in memory starting at location 1000 in table 3.4.
Table 3.4
Big Endian versus Little Endian Storage
Big Endian
Little Endian
Memory
Location
Value
Stored
Memory
Location
Value
Stored
1000
89
1000
EF
1001
AB
1001
CD
1002
CD
1002
AB
1003
EF
1003
89
(All numbers are in hexadecimal)
The choice of which endian system to use does not impact the performance of the CPU and
memory. Both systems are in use today with Motorola and SUN processors using big endian,
whereas Intel processors use little endian. The type of endian used is an issue in data
communications because TCP and IP packets have 16- and 32-bit numeric fields, for example,
the IP address, and the protocol specifies that these be transmitted in big endian order.
C. Types of Memory
The address space of a computer can have both random access memory (RAM) and read-only
memory (ROM). Below, we discuss both types of memory.
Random Access Memory (RAM)
Random access memory (RAM) refers to memory cells that can be accessed for information
transfer from any desired random location. That is, no matter where the cells are located
physically in memory, the process of locating a word in memory is the same and requires the
same amount of time.
RAM Characteristics
The key characteristic of RAM is that data can be written into and read from any memory word
without affecting any other word. When you first turn on your computer, the operating system is
loaded into the RAM so it can manage all the processes that your computer will perform. If you
decide to do word processing, your operating system will load the word-processing application
from your hard drive into RAM so that you can begin your word-processing tasks. As you use
your application to create or edit some document of interest to you, the operating system will
manage the program in execution. It will also do a retrieval of documents from your hard drive
or external disk (known as a read operation), or store a document on your hard drive or external
disk (known as a write operation) when you so request.
Most RAM used in computers today is semiconductor dynamic RAM. Semiconductor dynamic RAM
has one very significant weakness: if the power to the computer is interrupted, everything in
RAM is lost and cannot be recovered. Much to the frustration of the unprepared user, if a
document exists only in RAM and power is lost, the document no longer exists.
RAM Size
The two main factors that determine the size of RAM are the number of words that it stores, and
the length of those words. The size is usually stated as the number of words times the number of
bits per word. Thus, the size of a memory that stores 2,048 (2K) words where each word is 16
bits long is 2K words x 16 bits/word.
Looking at the RAM above, we must answer the following questions:
How many address lines and data lines are required for a memory that stores 2K words
where each word is 16 bits?
How many bytes can this memory store?
The calculations below provide the answers:
2K can be represented by 211 as shown below:
2K = 2 x K = 21 x 210 = 2(1+10) = 211
Therefore, 11 address lines are required.
16 data lines for data input/output are required, one for each bit in the 16-bit data word.
The number of bytes that can be stored is:
(2K words x 16 bits per word)
= 4K bytes
8 bits/byte
RAM Connections
The following connections are required:
Read: An input control signal that tells the RAM when a READ is to be performed.
Write: An input control signal that tells the RAM when a WRITE is to be performed.
Select: An input that selects a particular RAM chip from other RAM chips in memory.
A block diagram of a typical 2K x 16 RAM chip is shown below:
Figure 3.7
Block Diagram of a 2K x 16 RAM Chip
Read-Only Memory (ROM)
Computers have another type of memory that almost never changes, called read-only memory
(ROM). Changing ROM is beyond the capability of the average user. Permanently stored in the
computer, ROM provides many of the lower-level instructions used by the operating system to
interact with the components of the computer.
ROM is specific to the processor in each computer and is independent of the operating system.
When the computer is turned off or accidentally shut down, there is no loss of ROM.
Because ROM is more expensive than RAM, computer manufacturers generally use the smallest
capacity ROM possible. As with RAM, the two main factors that determine the size of ROM are
the number of words that it stores and the length of those words.
ROM differs from RAM in the following ways:
There is no user input data because the contents of ROM are permanently stored.
There is no need for read/write selects because it is only possible to read from ROM.
The calculations for a ROM that stores 4K words where each word has 8 bits are shown below:
4K can be represented by 212 (4K = 4 x K = 22 x 210 = 212)
Therefore, 12 address lines are required.
Eight data lines for outputting data are required, one for each bit in the 8-bit data word.
The number of bytes that can be stored is:
(4K words x 8 bits per word)
= 4K bytes
8 bits/byte
The following connections are required:
select: An input that selects a particular ROM chip from other ROM chips in memory.
A block diagram of the 4K x 8 ROM chip is shown below:
Figure 3.8
Block Diagram of a 4K x 8 ROM Chip
The following types of ROMs are available:
custom ROMs ("simple" ROMs): The factory programs the ROMs.
PROMs (programmable ROMs): The customer purchases a "raw" ROM and (carefully!)
does the programming. A PROM may be programmed only once. PROMs and ROMs are
robust and fast.
EPROMs (erasable PROMs): EPROMs give more flexibility in programming, at the
expense of speed and robustness. Such ROMs are often called read-mostly ROMs, to
signify that the ROMs are mostly going to be read, but that they can also be written into
occasionally. Writing should not be done too often because it is slow and laborious.
EPROMs may have their entire programming wiped clean by exposure to UV light for
about 20 minutes. We can also get electrically erasable PROMs (EEPROMs), where
any individual PROM location may be reprogrammed in place, using just electrical signals.
The erasure operation takes much longer (by several microseconds) than a standard
read.
Finally, we have flash memory, which provides electrical erasure, but only to groups of
words, not to individual words.
D. Input/Output (I/O)
The instructions we used for I/O in the LMC were IN and OUT. Each time we called one of those
instructions, we could input or output one three-digit number or one word. A similar instruction
in a real computer will input or output one word. This type of I/O, called programmed I/O, is a
relatively inefficient method of I/O. In this section, we will look at programmed I/O, and then at
other, more efficient methods, such as programmed I/O with interrupts and direct memory
access (DMA).
Programmed I/O
We start with a consideration of some of the I/O devices that might be connected to the CPU.
They include:
keyboard
(input)
mouse
(input)
voice
(input and output)
scanner
(input)
printers
(output)
graphics display (output)
optical disk
(input and output)
magnetic tape (input and output)
magnetic disk (input and output)
modem
(input and output)
communications adaptor (wired or wireless)
electronic instrumentation
controls and indicators on an appliance, such as a microwave oven or a cell phone (when
the processor is embedded in the appliance)
Some I/O devices operate at a low speed, such as keyboards or printers, and some operate at a
high speed, such as magnetic or optical disks. Some transfer one character at a time, such as a
mouse, and some transfer blocks of data at once, such as a printer or a graphics display.
In general, multiple I/O devices will be connected to the CPU through a set of I/O modules, as
shown in the figure below. The CPU will control the process with an I/O address register similar
to the MAR and an I/O data register similar to the MBR.
Figure 3.9
Use of Multiple I/O Modules
To handle I/O devices, computer systems must be able to:
address different peripheral devices
respond to I/O initiated by the peripheral devices
handle block transfers of data between I/O and memory
handle devices with different control requirements
The problem with the type of programmed I/O used by the LMC is that I/O can only occur when
called for by the CPU, and thus there is no way for the user or a peripheral to initiate commands
to the CPU.
One solution to this problem is to use polling, a technique in which the CPU uses programmed
I/O to send out requests to I/O devices to see if they have any data to send and to receive that
data when it's sent. The disadvantage of polling is that it requires a large overhead because each
device must be polled frequently enough to ensure that data held by the I/O device awaiting
transfer is not lost. If there are many devices to be polled, much of the CPU's time is wasted
doing polling instead of doing other, more useful, processing.
The use of interrupts, which we discuss next, is a better way of managing I/O requests.
Interrupts
Interrupts are signals sent on special control lines to the CPU. When an interrupt is received,
the CPU stops its execution of the current program after completing the current instruction and
jumps to a special interrupt processing program. Before leaving the current program, however,
the CPU completes the instruction it was processing and then stores all of the registers in a
group of registers called a stack, or in a table called the process control block (PCB).
After the interrupt is serviced, the CPU restores the registers that had been stored in the stack or
the PCB. The CPU then returns to processing the original program.
Figure 3.10 below shows the procedure for processing an interrupt. Successively clicking on the
Step buttons will walk you through the interrupt process.
Figure 3.10
Servicing an Interrupt
An interrupt can be used in the following ways:
as an external event notifier.
The CPU is notified that an external event has occurred. For example, an interrupt is used
to notify the system that there is a keyboard input.
as a completion signal.
This type of interrupt can be used to control the flow of data to an output device. For
example, because printers are slow, an interrupt can be used to let the CPU know that a
printer is ready for more output.
as a means to allocate CPU time.
This type of interrupt is used in a multi-tasking system where more than one program is
running. Each of the programs is allowed access to the CPU for a short period of time to
execute some instructions. Then, another program is allowed to access the CPU. The
interrupt is used to stop the current program from executing and then to switch to a
dispatcher program, which allocates a block of execution time to another program.
as an abnormal event indicator.
This type of interrupt is used to handle abnormal events that affect operation of the
computer system. Two examples of abnormal events are:
o
o
a power loss
an attempt to execute an illegal instruction such as "divide by zero"
Interrupts have many sources. In fact, the standard I/O for today's PC has 15 interrupt lines
labeled IRQ1 through IRQ15 (where IRQ is Interrupt ReQuest). With 15 interrupt lines, it is
possible—in fact, probable—that multiple interrupts will occur simultaneously from time to time.
Thus, when an interrupt occurs, several questions must be answered before the interrupt is
serviced:
Are there other interrupts awaiting service, or is an interrupt currently being serviced?
What is the relative priority of the interrupts?
What is the source of each interrupt?
With multiple interrupts, there is a priority system in which the various interrupts are ranked by
importance. For example, an abnormal-event interrupt caused by a power failure is more
important than a completion interrupt telling the CPU that a printer is ready for more output.
Direct Memory Access (DMA)
With programmed I/O, the data are loaded directly into the CPU. It is more efficient, however, to
move large blocks of data (including programs) directly to or from memory rather than to move
the data word-by-word into the CPU and then transfer it word-by-word into memory. For the
LMC, it's like loading data directly from the rear of the mailboxes, thus bypassing the LMC I/O
instruction procedures. Moving a block of data directly between memory and an I/O module is
called direct memory access.
Three primary conditions that must be met for a DMA transfer are:
There must be a method to connect the I/O interface and memory.
The I/O module involved must be capable of both reading and writing to memory without
participation of the CPU. In particular, the I/O module must have its own MAR and an
MBR.
There must be a means to avoid conflict between the CPU and the I/O module when both
are attempting to access memory.
The use of DMA is advantageous because:
high-speed transfers are possible.
the CPU is available to perform other tasks during the I/O transfers.
DMA transfers are possible in either direction; for example, CD music can be played on
your PC while the computer is used for other tasks.
The procedure used by the CPU to initiate a DMA transfer requires four pieces of data to be
provided to the I/O controller:
1.
2.
3.
4.
the
the
the
the
location of the block of data on the I/O device
starting location of the block of data in memory
size of the block of data to be transferred
direction of transfer—either a read (I/O → memory) or a write (memory → I/O)
Figure 3.11 below shows the DMA process during a data transfer. Successively clicking on the
Step buttons will walk you through the DMA process.
Figure 3.11
The DMA Process
E. Buses
The CPU, memory, and I/O peripherals are interconnected in various ways. The basic
components involved in the interface are listed below:
the CPU
the I/O peripheral devices
memory
I/O modules
the buses that connect the other components
The basic pathways for those interconnections are shown in figure 3.12.
Figure 3.12
Bus Connections for CPU-Memory-I/O Pathways
The two basic I/O system architectures are:
1. bus—one or more buses connects the CPU and memory to the I/O modules
2. channel—a separate I/O processor, a computer in its own right, performs complex input
and output operations
Buses are characterized by:
type (parallel or serial)
configuration
width (number of lines)
speed
use
Parallel buses have an individual line for each bit in the address, data, and control words. They
run over short distances (up to a few feet) at high speeds because all of the data bits for one
word are transferred at the same time. As distance increases, however, it becomes more difficult
to keep the bits of a word synchronized. Parallel buses are generally used for all internal buses
because speed is a critical factor in performance.
Serial buses transfer data sequentially, one bit at a time. They usually carry data in external
buses over greater distances at a somewhat lower data rate. Generally speaking, serial buses
have a single data pair and several control lines.
The characteristics of several common buses are listed below:
System buses
ISA (Industry Standard Architecture) is a parallel bus with a 16-bit data width and
separate lines for addresses and data. It was used by the X86 and PowerPC families of
computers, but is being phased out of use in favor of the PCI bus.
PCI (Peripheral Control Interface) is a parallel bus with 32- or 64-bit data width. The
data and address share the same lines through multiplexing of the signals, i.e., addresses
are sent and then the data follow on the same line. This bus is replacing the ISA bus on
the X86 and PowerPC families of computers.
External buses
SCSI (Small Computer System Interface) is a parallel bus that allows multiple I/O
devices to be daisy-chained.
USB (Universal Serial Bus) is a serial bus that is faster (up to 12 megabits per second
throughput) than the RS-232 bus. It has a four-wire cable with two wires for address and
data, and two lines to carry power to the I/O device.
IEEE 1394 (also known as FireWire) is a serial bus that can handle up to a 400
megabits-per-second data rate. It can either be daisy-chained or connected to hubs.
The signals in computer buses are usually divided into three categories:
1. data
2. addresses
3. control information, including interrupts
The parallel bus connecting a printer to a PC is one case where an address line is not required,
because it is a point-to-point connection. However, when the printer is connected in a daisy
chain with other devices (such as a fax machine, a scanner, and an external disk drive), the
address must be added.
III. Instruction Set Architecture
A. Instruction Word Formats
Instructions are divided into two parts:
1. The op code, or operation code, tells the computer what operation to perform.
2. The operand (the address in the LMC) provides the address of the source of the data to
be operated on and the destination of the result of the operation.
In the Little Man Computer, the op code is one digit in length and the operand is two digits. In a
real CPU, however, such a simple arrangement would not work.
Let's start with a discussion of the op code. The length of the op code is a function of the number
of operations that the CPU can perform. An instruction set with a 4-bit op code will allow up to
16 (24) different operations to be performed, with each operation requiring a unique op code. But
if 17 to 32 operations were required, an op code of 5 bits (because 32 = 25) would be required to
provide unique op codes for each operation.
Next, let's look at the operands. Operands specify the location or address of data that are used
in the instruction. Data instructions have one or two source operands and one destination
operand. Operands may be either explicit or implicit. Explicit operands are included in the
statement of the instruction, whereas implicit operands are understood without being stated.
Instructions may be written with zero to three operands. The number of bits required for each
operand can vary because some operands point to one of a limited set of CPU registers, whereas
others point to memory locations. The former may be identified in 4 bits (for a CPU with 16
general-purpose registers), whereas the latter may require 32 bits or more (based on the size of
memory).
The length of an instruction in a given computer can also vary. For instance, the IBM mainframes
may have instructions that are two bytes (16 bits), four bytes, and six bytes long. On the other
hand, the Sun SPARC RISC instructions are always four bytes long.
The advantage of variable-length instructions is that more flexibility is provided to the system
programmers. The advantage of fixed-length instructions is that the system design can be
simplified, and it may be easier to optimize system performance because of the simpler design.
Zero-Operand Instructions
Zero-operand instructions have the form:
op code
These instructions may be used for instructions that do not involve data, i.e., HLT. They also
may be used as stack instructions. For example, the following instructions would put both A and
B on the top of the stack and then multiply them.
PUSH
PUSH
MUL
We will discuss stack instructions further in the next section.
One-Operand Instructions
One-operand instructions have the form:
op code operand
where the accumulator and the operand contain the operands used in the calculation, and the
result is placed in the accumulator. For example:
MUL A
would be written for the equation:
accumulator = accumulator * A
Two-Operand Instructions
Two-operand instructions have the form:
op code operand 1 operand 2
where operand 1 and operand 2 are used in the calculation, and the result is placed in operand
1. For example:
MUL A, B
would be written for the equation:
A=A*B
Three-Operand Instructions
Three-operand instructions have the form:
op code operand 1 operand 2 operand 3
where operand 2 and operand 3 are used in the calculation, and the result is placed in operand
1. For example:
MUL A, B, C
would be written for the equation:
A=B*C
B. Classes of Instructions
Instructions can be classified in the following ways:
data-movement instructions
The LOAD and STORE instructions in the Little Man Computer are good examples of this
class of instruction. Many real CPUs have additional forms of these basic operations.
Another example of data-movement instruction is MOVE.
arithmetic instructions
The ADD and SUBtract instructions in the Little Man Computer are examples of this class
of instruction. Most real CPUs also have instructions for floating-point and BCD
arithmetic. Multiplication and division are often implemented in hardware, but can also be
performed using addition, subtraction, and shifting instructions.
Boolean-logic instructions
Boolean instructions were not included in the Little Man Computer instruction set. In a
real CPU, the AND, OR, and EXCLUSIVE-OR instructions are implemented. We will discuss
Boolean-logic instructions in more detail below.
single-operand manipulation instructions
These instructions include COMPLEMENT, INCREMENT, DECREMENT, and NEGATE.
Negating is performing the 2's complement of the values in a register. We will discuss
single-operand manipulation instructions in more detail below.
bit-manipulation instructions
These instructions operate on single bits in an instruction.
shift and rotate instructions
There are three basic instructions: LOGICAL SHIFT, ROTATE, and ARITHMETIC SHIFT. We
will discuss shift and rotate instructions in more detail below.
program and control instructions
These instructions include jumps, branches, subroutine calls, and returns. The Little Man
Computer has three of these instructions: Branch, BRZ (branch on zero), and BRP
(branch on positive).
stack instructions
Stack instructions are used to access one of the most important structures in
programming—the stack. We will discuss stack instructions in more detail below.
multiple-data instructions
Multiple data instructions perform single operations on multiple pieces of data. They are
also known as SIMD or single instruction, multiple-data instructions and are used in
multimedia applications.
privileged instructions
Privileged instructions can only be executed by the operating system because they affect
the operating status of the computer and thus could affect other programs that are
running. The HALT, INPUT, and OUTPUT instructions fall into this category. For example,
the program can request an INPUT, but it is up to the operating system to perform that
instruction in conjunction with INPUT and OUTPUT requests from other programs that are
running.
Boolean-Logic Instructions
Boolean-logic operations are implemented on a bit-by-bit basis. Thus, each set of bits, i.e., the
0th, the 1st, the 2nd, ..., the nth, are operated on independent of the other bits. See the
example below for the ANDing of two 8-bit words.
ANDing Two 8-Bit Words
A:
1011 0011
B:
1100 0101
A ● B: 1000 0001
Note that the result is 1 only when the two bits being ANDed are both 1, and that the result is 0
when either of the two bits being ANDed is 0.
Single-Operand Manipulation Instructions
As discussed in module 1, complementing a number involves changing all 1s to 0s, and all 0s to
1s. Incrementing a number is simply adding 1 to the number. A useful trick for incrementing a
number is to change the rightmost 0 to a 1 and then to change all of the bits to the right of that
0 from 1 to 0. The example below shows how to complement and then increment an 8-bit
number.
Complementing and Then Incrementing an 8-Bit Word
A:
0111 0000
complement of A: 1000 1111
+1
A' incremented:
1001 0000
Note: referring to the trick described above for incrementing a number:
0000
0000
0000
0000
0000
0000
0001
0011
0111
1111
plus
plus
plus
plus
plus
1
1
1
1
1
=
=
=
=
=
0000
0000
0000
0000
0001
0001;
0010;
0100;
1000;
0000;
i.e.,
i.e.,
i.e.,
i.e.,
i.e.,
decimal
decimal
decimal
decimal
decimal
0 plus 1 = decimal 1
1 plus 1 = decimal 2
3 plus 1 = decimal 4
7 plus 1 = decimal 8
15 plus 1 = decimal 16
Negating a number is the same as complementing and then incrementing that number. Thus, to
negate a number, take its 2's complement.
Shift and Rotate Instructions
There are three basic shift operations, and each shift can be in either the left or right directions:
1. logical shifts (left and right)—LOGICAL SHIFT LEFT and LOGICAL SHIFT RIGHT
2. arithmetic shifts (left and right)—ARITHMETIC SHIFT LEFT and ARITHMETIC SHIFT RIGHT
3. rotates (left and right)—ROTATE LEFT and ROTATE RIGHT
We explain these three operations below.
Logical Shifts
In logical shifts, the vacated spots [most significant bit (MSB) for a right shift and least
significant bit (LSB) for a left shift] are filled in by 0s. Study the examples below:
Figure 3.14
Logical Shifts
Right Shift
Left Shift
1011 is shifted right logically by one position,
and a 0 is placed in the MSB position.
1011 is shifted left logically by one position,
and a 0 is placed in the LSB position.
Arithmetic Shifts
Arithmetic shifts have a numerical connotation, and it is essential that the sign of the number be
preserved: a positive number must stay positive after an arithmetic shift (both left and right).
The same is true for negative numbers. Thus, in all arithmetic shifts, the sign bit is retained.
Study the following example, which shows successive left shifts of the binary pattern 000011,
which represents +3 in 2's complement. Note that each shift multiplies the number by 2, until
the point where overflow results. Left shifts cannot be done when the two MSBs have different
signs, because a further left shift will mean that the sign of the number will change.
Verify for yourself that each left shift doubles the number. Successively click on the Step button
to see this demonstration.
Figure 3.15
Arithmetic Shifts
Explanation:
Press Step to start.
Study the example below that shows successive right shifts of the pattern 101000, which
represents –24 in 2's complement. Note that each right shift performs an integer division (or
truncating) by 2, until 0 is obtained.
Start with the 2's complement representation for +24, and verify the result of each arithmetic
right shift.
Figure 3.16
Right Shifts
Rotate
Rotate operations move bits cyclically left or right. No particular arithmetic significance is
attached to them. Rotates are used to examine all the bits in a number or register without
destroying the number itself. For example, we could cyclically shift an n -bit number n times, and
doing so would give the original number back. But with each shift, a different bit would "pass
under" the LSB (or MSB), and this bit could be "examined" by doing a bit-wise AND with the
number 1 (binary 000...01).
See what happens when the word 1001 is rotated left.
Figure 3.17
Rotate Example
Step 1: Try to visualize the left rotation before clicking the
ANIMATE button. All the bits except the MSB move left by
one position. The MSB rotates into the LSB position.
Step 2: Before you click the ANIMATE button, try to visualize
the results after rotating left by two positions.
Step 3: When you click on ANIMATE, you will see how 1001
looks after rotating left by 3 positions.
Step 4: When you click on ANIMATE, you will see how 1001
looks after rotating left by 4 positions. Note that we are back
to the starting position.
If you start with the word 1001 and rotate right, you should get the following succession, ending
at the original position:
Figure 3.18
Rotate Right Example
Start with 1001.
1001 after rotating right by 1 position.
1001 after rotating right by 2 positions.
1001 after rotating right by 3 positions.
1001 after rotating right by 4 positions (note that we have returned to the
starting position).
Stack Instructions
A stack is a last-in first-out (LIFO) memory, where data items are pushed in at the top location
of the stack and access is permitted only to the top location of the stack. Stacks are used in
modern machines to store return addresses from subroutines and interrupts.
To visualize how a stack works, picture the spring-loaded circular device that holds plates in
some cafeterias. Dinner plates or salad plates are generally placed on the circular stand, which
has a powerful spring beneath the surface. As plates are placed on the surface, the entire group
of plates is lowered, compressing the spring. Whether you remove one plate or 10, they all come
off the top of the stack of plates. The last plates that were added are the first to be removed.
Two operations are associated with a stack:
PUSH describes the operation of inserting one or more items onto a stack.
POP describes the operation of deleting, one, some, or all of the items from a stack.
The stack pointer (SP) is a register that indicates the top of the stack. It is incremented or
decremented depending on whether a push or pop occurs, and how many items are added to or
deleted from the stack. The advantage of a memory stack is that the CPU can always refer to it
without having to specify an address, because the address is automatically stored and updated
and can be easily referenced in the stack pointer. Stacks may also be used to evaluate arithmetic
expressions, and we illustrate this process below.
Stack Evaluation of (A + B) * (C + D)
Suppose we want to find the result of (A + B) * (C + D), where A, B, C, and D are operands—in
this case, variables that are placeholders for data—and * and + are operators on the operands.
(Note that for clarity, we are using * to represent multiplication.) It is important to recognize
that there are two types of information here: operands (the variables) and operators. Each type
is governed by one of the following rules as we compute the value of the expression using a
stack, after first putting the expression into postfix form:
Rule 1: If the symbol we encounter is an operand (in this case, a variable), push it onto the
stack.
Rule 2: If the symbol we encounter is an operator (e.g., * or +), pop the top two items off
the stack, perform the operation involving the operator and the top two items, and push the
result onto the stack.
We can use a stack to compute the expression by:
1. converting the expression into postfix form so that it is amenable to stack manipulation.
In this form, also known as reverse polish notation, operators are placed after operands
instead of between them, and computations are made in the order listed below:
o parentheses
o multiplication and division
o addition and subtraction
2. reading the postfix expression from left to right, and, depending on the symbol we
encounter, using Rule 1 or Rule 2 as appropriate, until we come to the end of the
expression. If we have done all this correctly, there should be one item left on the stack
when we have come to the end of our expression, and that item should be the result.
Let's see how this works in practice by first looking at the general case of the stack evaluation of
the expression, and then by looking at a specific numerical case. First, we must reorder the
expression:
(A + B) * (C + D)
into postfix form by going from left to right and (within parentheses) putting operands (the
variables) after operators. The postfix operation always takes operands in the sequence in which
they show up in the original expression.
A + B becomes AB+
C + D becomes CD+
(AB+) * (CD+) becomes AB+CD+*
Now we can apply Rules 1 and 2 to evaluate the postfix expression:
AB+CD+*
Click successively on the Step button to see this demonstration of the use of stacks to evaluate
the postfix expression.
Figure 3.19
Stack Evaluation of AB+CD+*
Stack Evaluation of (2 + 3)
x(5 - 1)
Now, let's perform the stack evaluation of the arithmetic expression (2 + 3) x (5 – 1), where the
operands are numbers rather than the variables in our previous example.
First, reorder the expression into postfix form:
2 + 3 becomes 2 3 +
5 – 1 becomes 5 1 –
(2 3 +) x (5 1 –) becomes 2 3 + 5 1 – x
Now we can apply Rules 1 and 2 to evaluate the postfix expression 2 3 + 5 1 – x.
Click successively on the Step button to see this demonstration.
Figure 3.20
Stack Evaluation of (2 + 3) x (5 – 1)
C. Addressing Modes
In assembly language, each instruction may need some operands (data values). These operands
may be specified by providing either a register address (i.e., the name of a register, such as R1
or R2) or by specifying a main memory address ("operand is in location 3011").
Associated with each operand is an address mode, which influences how the specified address is
interpreted. Each operand involved in an instruction could be addressed using any of these
modes, and within a given instruction, different operands could be using different modes.
Addressing modes are used to determine the way operands are chosen during program
execution. The use of various addressing modes provides flexibility to:
use programming facilities such as pointers to memory, counters for loop control,
indexing of data, and program relocation
reduce the number of bits in the addressing field of the instruction
The Little Man Computer uses only one addressing mode, the direct addressing mode, also
known as direct addressing absolute, because the address given in the instruction is the actual
memory location being addressed. That implies that there is a non-absolute mode of addressing.
We will discuss two modes that use non-absolute addressing, base register addressing, and
relative addressing, later in this section.
We will start our discussion of addressing modes with a modified LMC to look at three common
types of addressing. The modified LMC uses a four-digit instruction with the additional digit,
representing the addressing mode, inserted as the second digit from the left. The three
addressing modes are:
immediate addressing (with an addressing mode of 1)
direct addressing (with an addressing mode of 0)
indirect addressing (with an addressing mode of 2)
The following discussions are based on different modes with the "load" instruction.
Immediate Addressing
Immediate addressing can be defined as:
an addressing mode in which the address field of the instruction contains the data
The fetch-execute cycle for a load instruction with immediate addressing would be:
PC → MAR
This is first step of the fetch cycle.
Mem[MAR] → MDR
This is the second step of the fetch cycle, in which the contents of memory location MAR
are transferred to the MDR.
MDR → IR
This is the third step of the fetch cycle, where the value in MDR is copied into the IR.
IR[op code] → decoder
This is the last step of the fetch cycle, where the op code portion of the instruction that is
in the IR is transferred to the decoder, and the decoder selects the instruction to be
executed.
IR[Address] → A
The data, which was the address portion of the instruction, is loaded into the A register.
PC + 1 → PC
The program counter is incremented.
Direct Addressing
Direct addressing can be defined as:
an addressing mode in which the address field of the instruction contains the address in
memory where the data are located
The fetch-execute cycle for a load instruction with direct addressing is:
PC → MAR
Mem[MAR] → MDR
MDR → IR
IR[op code] → decoder
IR[address] → MAR
The address portion of the instruction points to the location of the data in memory.
Mem[MAR] → MDR
MDR → A
The data from memory is loaded into the A register.
PC + 1 → PC
Indirect Addressing
Indirect addressing can be defined as:
an addressing mode in which the address field of the instruction contains the address in
memory that contains another address in memory where the operand data are located
The fetch-execute cycle for a load instruction with indirect addressing would be:
PC → MAR
Mem[MAR] → MDR
MDR → IR
IR[op code] → decoder
IR[address] → MAR
The address portion of the instruction points to the location of another address in
memory.
Mem[MAR] → MDR
MDR → MAR
The contents of the MDR point to the location of the data in memory.
Mem[MAR] → MDR
MDR → A
The data from memory is loaded into the A register.
PC + 1 → PC
Carefully read the three definitions above to see if you understand the differences. Then, follow
the table of memory locations and values below along with the interactive examples illustrating
the load immediate 20, load direct 20, and load indirect 20 instructions in the LMC.
Table 3.5
Sample Memory Locations and Values
Memory Address
Data at Memory Address
20
40
30
50
40
60
50
70
The instruction load immediate 20 loads the value 20 into the register because the operation
specifies that the operand datum is in the operand field.
The instruction load direct 20 loads the value 40 into the register. The load direct 20 operation
tells the computer that the operand datum is located at memory address 20. In our case, the
value at location 20 is 40.
The instruction load indirect 20 loads the value 60 into the register. The load indirect 20
operation tells the computer to go to memory address 20 to find another memory address, 40.
In our case, the actual operand datum, 60, is located at the memory address 40.
Let's look at how all this would appear graphically:
Figure 3.21
Immediate, Direct, and Indirect Addressing
Load Immediate 20
Load Direct 20
Load Indirect 20
Register Direct Addressing
Register direct addressing can be defined as:
an addressing mode in which the address field of the instruction contains the address of the
register where the data are located
Register direct addressing is similar to direct addressing. The difference is that the data are
located in a CPU register rather than in memory. The figure below shows register direct
addressing in the LMC.
Figure 3.22
Register Direct Addressing for a Load Instruction
Register Indirect Addressing
Register indirect addressing can be defined as:
an addressing mode in which the address field of the instruction contains the register address
that contains an address in memory where the operand data are located
This is similar to indirect addressing. The difference is that the address of the data is located in a
CPU register rather than in another memory location. The figure below shows register indirect
addressing in the LMC.
Figure 3.23
Register Indirect Addressing for a Load Instruction
Base Register Addressing
Base register addressing, also known as base offset addressing, can be defined as:
an addressing mode in which the address field of the instruction is added to the contents of a
base register to determine the final address of the data
Base addressing is an example of a non-absolute addressing mode. The advantage of this mode
is that the entire program may be moved to a different location in memory and still be addressed
by changing the contents of the base register.
The fetch-execute cycle for a load instruction using base register addressing would be:
PC → MAR
Mem[MAR] → MDR
MDR → IR
IR[op code] → decoder
IR[address] + BR → MAR
The sum of the address portion of the instruction register and the base register points to
the final address.
Mem[MAR] → MDR
MDR → A
PC + 1 → PC
In figure 3.24 below, we introduce an additional modification to the LMC with a four-digit base
register. This register contains a starting point in memory, while the operand value provides an
offset from that starting point.
In figure 3.24, part a, we have a program that starts at location 1000, and a current instruction
with an address field (operand) of 10. Using base addressing, the actual location of the data in
memory is:
1000 (the base register value) + 10 (the operand) = 1010
In figure 3.24, part b, the program has been moved by the operating system (we will discuss
moving the program in module 4) to start at location 2000. The new final address of the data is
2000 (the base register value) plus 10 = 2010.
Figure 3.24
Base Register Addressing
Relative Addressing
Relative addressing can be defined as:
an addressing mode in which the address field of the instruction is added to the contents of
the program counter (PC) to determine the final address of the data
Relative addressing is also a non-absolute addressing mode. It differs from base addressing in
that the final address is relative to the program counter instead of the base register.
The fetch-execute cycle for a load instruction using relative addressing would be:
PC → MAR
Mem[MAR] → MDR
MDR → IR
IR[op code] → decoder
IR[address] + PC → MAR
The sum of the address portion of the instruction register and the program counter points
to the final address.
Mem[MAR] → MDR
MDR → A
PC + 1 → PC
Indexed Addressing
Indexed addressing can be defined as:
an addressing mode in which the address field is added to the contents of the index register
to determine the final address of the data
Indexed addressing is similar to base addressing in that the content of two registers is added to
determine the final address. The differences are philosophical. In base addressing, the base
address is relatively large and is intended to locate a block of addresses in memory. The base
register does not change until the program is moved to a new location in memory.
In indexed registering, the index is relatively small and is used as an offset for handling
subscripting. Thus, the index register frequently changes during program executing.
The fetch-execute cycle for a load instruction using indexed addressing is:
PC → MAR
Mem[MAR] → MDR
MDR → IR
IR[op code] → decoder
IR[address] + X → MAR
The sum of the address portion of the instruction register and the index register counter
points to the final address.
Mem[MAR] → MDR
MDR → A
PC + 1 → PC
It is also possible—and quite common—to combine modes of addressing. Figure 3.25 a gives an
example of indexed addressing. Figure 3.25 b gives an example of indexing a base offset
address. In figure 3.25 a, the index register value of 5 is added to the address field of 100 to
obtain the final address of 105. In figure 3.25 b, the index register value of 5 is added to the
address field of 100 and the base register value of 1000 to get the final address of 1105.
Figure 3.25 a
Indexed Addressing
Figure 3.25 b
Indexing with a Base Register
Indirect Indexed Versus Indexed Indirect Addressing
When indirect and indexed addressing modes are combined, the sequence determining which
mode is applied first makes a difference. Figures 3.26 a and 3.26 b show how the sequence
determining how addressing modes are applied can affect the answer. In both figures, the
address field has a value of 20, the index register has a value of 10, and indirect addressing is
used in combination with indexed addressing.
In figure 3.26 a, the sequence is indexed, then indirect addressed. Applying indexed registering
gives an address of 20 + 10 = 30. Indirect addressing uses address 30 to point to the final
address, 50, where the data value of 70 is located.
Figure 3.26 a
Using the Indexed Indirect Addressing Mode
In figure 3.26 b, the sequence is indirect, then indexed. Applying the indirect addressing first, we
determine the pre-indexed address as 30 (the value at location 20), and then index 30 by 10 to
find the final address of 40. The data value of 60 is the value at location 40.
Figure 3.26 b
Using the Indirect Indexed Addressing Mode
D. Instruction Set Architecture Comparisons
In this section, we look at the instruction set of architectures of several computer families—the
X86, PowerPC, and IBM mainframe—to show the fundamental similarities between all computer
CPUs.
X86
The X86 family of computers includes the following:
8088, 8086, 80286, 80386, and 80486
Pentium, Pentium Pro, Pentium II, Pentium III, and Pentium 4
Itanium
Instructions with zero, one, or two operands are supported:
Zero-operand instructions take actions that do not require a reference to any memory, or
register data in the instruction. Stack operations, which do not refer to the data within
the instruction, are also zero-operand instructions.
One-operand instructions modify data in place, i.e., they complement or negate data in a
register.
Two-operand instructions use both a source and destination for the data.
The registers for the above instructions could vary from 16 bits on the early X86 versions to 32
bits on the later X86 versions. The later X86 versions also have 80-bit or 128-bit registers for
floating-point instructions.
The classes of instructions that are supported include:
data transfer instructions
integer arithmetic instructions
branch instructions
bit manipulation, rotate, and shift instructions
input/output instructions
miscellaneous instructions such as HALT
floating-point instructions (on the later versions of the family)
The addressing modes include:
immediate mode
direct addressing mode
register mode (in these modules, referred to as register direct addressing)
register deferred addressing mode (in these modules, referred to as register indirect
addressing)
base addressing mode
indexed addressing mode
base indexed addressing mode (in these modules, referred to as indexed with base
register addressing)
We will discuss other aspects of the X86 family in module 4.
PowerPC
The PowerPC is a family of processors developed around a specification for open system software
developed jointly by Apple, Motorola, and IBM. Members of this family include the Apple Power
Macintoshes and the IBM RS/6000.
Every instruction in the PowerPC is 32 bits in length. The instruction set is divided into six
classes:
integer instructions
floating-point instructions
load/store instructions (to move data to and from memory)
flow-control instructions (for conditional and unconditional branching)
processor-control instructions (including moves to and from registers and
systems/interrupt calls)
memory-control instructions (to access cache memory and storage)
Three addressing modes are used:
register indirect
register indirect with indexing
register indirect with immediate offset
The last two modes are not exactly what we discussed in section III C, but are similar to the
indexed addressing mode we presented.
We will discuss other aspects of the PowerPC family in module 4.
IBM Mainframe
The IBM mainframe family includes the System 360/370/390/zSeries family. Here, we discuss
aspects of the z Series system.
The z/Series instructions have 0, 1, 2, or 3 operands. The 3-operand instructions operate on
multiple quantities of data, with two of the operands used to describe the high and low end of
the range of data. For example, a group of all data between memory locations 10 and 20 would
have the operand addresses 10 and 20.
Five classes of instructions are used:
general instructions (used for data transfer, integer arithmetic and logical instructions,
branches, and shifts)
decimal instructions (used for simple arithmetic, including rounding and comparisons)
floating-point instructions
control instructions (used for system operations)
input/output instructions
Four types of addressing are used:
immediate addressing
register addressing (in these modules, referred to as register direct addressing)
storage addressing (in these modules, referred to as base addressing)
storage indexed addressing (in these modules, referred to as indexed with a base register
addressing)
We will discuss other aspects of the z/Series in module 4.
Module 4: Advanced Systems Concepts
Commentary
Topics
I.
II.
III.
CPU Design
Memory System Concepts
Performance Feature Comparisons
I. CPU Design
Like their automotive industry counterparts, computer designers try to obtain the highest
possible performance from their designs. In module 1, we saw how an additional measure of
precision was obtained in the IEEE 754 standard by using an implied "1" in the significand. A
second means of improving performance—through the use of various addressing methods—was
investigated in module 3. In this module, we will look at several other approaches to improving
performance, including the use of:
RISC processors in place of CISC processors
pipelining and superscalar processing
cache memory
virtual memory
Several additional steps can be taken to improve performance, including using:
processors with more than one CPU to assist in the computations
faster clock speed
wider instruction and data paths, which allow the CPU to access more data in memory or
fetch more instructions at one time
faster memory and disk accesses
A. RISC versus CISC
There are two different types of CPU design: Complex Instruction Set Computer (CISC) design
and Reduced Instruction Set Computer (RISC) design.
CISC is represented by the X86 (80286, 80386, 80486, and Pentium) family of processors, the
IBM 360 (360, 370, 390, and zSeries) family of processors, and the Motorola 6000 series
microprocessors.
RISC is represented by the PowerPC (IBM RS/6000, IBM AS/400, and Apple Power Macintoshes)
and the Sun SPARC processors.
The RISC CPU differs from the CISC CPU in five principal ways. It has:
1. a limited and simple instruction set made up of instructions that can be executed at
high speeds. The high speeds are accomplished, in part, by using a hard-wired CPU with
instructions that are pipelined, e.g., the fetch phase of the next instruction occurs during
the execution phase of the current instruction.
2. register-oriented instructions with limited memory access. Most of the RISC
instructions operate on data in registers; there are only a few memory access instructions
(i.e., LOAD and STORE).
3. fixed length and fixed format instructions that make it easier to pipeline follow-on
instructions. The SPARC RISC format has five instruction formats, each 32 bits long. In
comparison, the IBM 360 format has nine formats that vary in length from 16 to 48 bits.
4. limited addressing modes. RISC has only one or two addressing modes, as compared
with the various modes discussed in module 3. Less complicated addressing leads to
simplified CPU design.
5. a large bank of registers. RISC CPUs have many registers—often 100 or more—that
allow multiple programs to execute without moving data back and forth from memory.
Proponents of RISC and CISC architectures argue over the merits of these approaches. These
arguments have become less meaningful, however, as improved technology has led to new CPU
enhancements that combine the features of each design.
Newer RISC designs have an increased number of instructions, which has been made possible
because today's technology provides for faster processing techniques and more logic on the
system chips. Newer CISC designs have an increased number of user registers and more
register-oriented instructions. Thus, the features and capabilities of recent RISC and CISC CPUs
are very similar. Two of these features—pipelining and superscalar processing, which we discuss
in the next section—help compensate for the argued differences of the two architectures. Thus,
the choice of RISC or CISC design simply reflects the preferences and specific goals of the CPU
designers.
B. Pipelines and Superscalar Processing
In module 2, we briefly mentioned that clock pulses are used to trigger flip-flops in sequential
circuits. The entire computer is synchronized to the pulses of an electronic clock. This clock
controls when each step of a computer instruction occurs. For example, figure 4.1 shows the
series of pulses required for an ADD instruction in the Little Man Computer (LMC, discussed in
section II A of module 3), where:
the
the
the
the
first three pulses perform the fetch
fourth step performs the decoding
next two steps execute the ADD instruction
seventh pulse starts the next instruction
Note that the step:
PC + 1 → PC
is performed early because it doesn't affect the calculation. Also, these steps:
IR[opcode] → decoder and IR[address] → MAR
are performed in parallel. Performing the steps in parallel reduces the number of clock cycles.
Figure 4.1
Timing for an ADD Instruction
When pipelining is used, steps of instructions that are in a sequence are fetched, decoded, and
executed in parallel. While one instruction is being decoded, the next instruction is being fetched,
and while the first instruction is being executed, the second instruction is being decoded and a
third instruction is being fetched. In the figure below, note that four different instructions are
performing at the same time after pulse 4:
instruction
instruction
instruction
instruction
1
2
3
4
is
is
is
is
on
on
on
on
step
step
step
step
4
3
2
1
Figure 4.2
Pipelining Instructions
A further improvement to pipelining is to separate the fetch, decode, and execute phases of the
fetch-execute cycle into separate components, then use multiple execution units for the
execution phase, and pipeline the execution portions of the instructions. With only one execution
unit, the processor is considered scalar. But with multiple execution units, the processor is
considered superscalar.
Figure 4.3 shows a comparison of scalar and superscalar processing. Figure 4.3 a shows that,
with scalar processing, the fetches, decodes, and executions are pipelined. Figure 4.3 b shows
that, with two fetch-execute units, two complete sets of instructions are performed in parallel.
Figure 4.3
Scalar Processing versus Superscalar Processing
a. Scalar Processing
Instruction 4 - - - - - - - - - - - - - - -
fetch
decode
Instruction 3 - - - - - - - - - - fetch
decode
execute
Instruction 2 - - - - - fetch
execute
Instruction 1 fetch
decode
execute
decode execute
b. Superscalar Processing
Instruction 4 - - - - - fetch
decode
execute
Instruction 3 - - - - - fetch
decode
execute
Instruction 2 fetch
decode execute
Instruction 1 fetch
decode execute
We can use what we have learned thus far to show the benefits of pipelining for a computer
using the LMC instruction set that was described in section I-C of module 3. For this analysis, we
will make some simplifying assumptions:
1.
The PC incrementing step is performed in parallel with another instruction, as shown on
pulse 2 in figure 4.1.
2.
An average program is composed of:
o
70 percent STORE, LOAD, ADD, and SUBTRACT instructions, requiring six steps
o
30 percent IN, OUT, and BRANCH instructions, requiring five steps each
HLT is ignored because it occurs only once per program.
3.
4.
5.
The clock runs at a 10 MHz rate, e.g., 10,000,000 ticks per second.
A new step starts with each tick of the clock.
With a pipeline, the time for each instruction (after the first) is reduced to the time
required for the first step.
6.
There are no delays in the pipeline.
Example 4.1 shows the calculation for the number of instructions per second (IPS) for execution
without pipelining. Example 4.2 shows the calculation for the number of instructions per second
(IPS) for execution with pipelining.
Example 4.1—Instructions per Second Without Pipelining
Average number of steps per instruction:
= .70 * 6 steps + .30 * 5 steps
= 5.7 steps
Instructions per second (IPS):
= (10,000,000 ticks/sec * 1 step/tick) / (5.7 steps/instruction)
= 1.75 million IPS
Example 4.2—Instructions per Second With Pipelining
Average number of steps per instruction:
= .70 * 1 step + .30 * 1 step
= 1.0 step
Instructions per second (IPS):
= (10,000,000 ticks/sec * 1 step/tick) / (1.0 step/instruction)
= 10 million IPS
In a RISC architecture where a new instruction is started on each clock tick, the average number
of steps per instruction would approach 1. Then, the number of instructions per second would
approach 10 million IPS.
II. Memory System Concepts
A. Memory Hierarchy
An entire hierarchy of memories exists. Levels 1–3 of this hierarchy have traditionally been
called primary storage.
1. CPU registers—Registers are by far the fastest and most expensive type of memory,
with each register providing temporary storage for only one word while it is being
processed in the CPU.
2. cache memory level 1—This type of memory is incorporated into the CPU using small,
very high-speed static RAM (SRAM) chips and is the fastest to access.
3. cache memory level 2—This type of memory is usually found outside the CPU. It is a
larger, but slower, cache built from dynamic RAM (DRAM) that backs up the first cache.
Some modern CPUs contain both level-1 and level-2 cache memory.
4. main memory (MM)—MM, also known as RAM, is built from DRAM chips. It contains
programs and has a slower access time than cache memory.
5. disc storage—Disc storage, which includes disks, tapes, CD-ROMs, bubble memories,
and so on, offers much larger capacities at the expense of access time.
Throughout the memory hierarchy, there is an inverse relation between access time and size.
Memory components in the memory hierarchy grow larger as their distance from the CPU
increases. The cost per byte decreases as the memory component moves further from the CPU,
but the access time becomes longer.
We discussed main memory (MM) in module 3. We will discuss cache memory in detail in the
following section.
B. Cache Memory
A cache memory (CM) is a small, high-speed memory that is placed between the CPU and MM.
If we are lucky, most of the CPU accesses will be to CM, which has the effect of creating a faster
memory cycle. The MM will be accessed only if what is being sought is not in CM. For such cases,
the access time will actually increase slightly, but one hopes this will not happen frequently.
A key measure of cache efficiency is the hit ratio, which is the fraction of memory accesses that
succeed in finding an item within the cache. Hit ratios of around 80 percent are considered to be
very good. The success of a CM is predicated on the principle of locality of reference, which
simply refers to the fact that, in typical programs, references to memory tend to be in localized
areas, due to subroutines and loops within the program.
You may wish to see how a CPU access to memory works when a cache is present.
Two terms we must understand to discuss cache memory are:
cache slot—This contains an MM address, or portion of an address, and one or more
data words. It is also known as a cache word.
tag—This is a field used to uniquely identify which MM address has been mapped into a
cache word.
In addition, we must understand the following CM concepts:
cache types
valid bit
block size
address mapping policies (see below)
Address Mapping
The address mapping process used in the cache is by far the most important factor in
determining performance. This process must occur with every single memory reference.
Therefore, it is crucial that all parts of this process be carried out by hardware rather than by
software, because hardware has the advantage of speed. The three mapping methods described
below differ in the way in which they determine how this mapping is done:
1. associative mapping
2. direct mapping
3. set-associative mapping
We will compare these methods by applying them to the same situation—an MM of 32K x 12 (15bit addresses). The CM size will be 512 words (9-bit addresses). The word size of the cache
depends on the mapping method. This organization is shown in figure 4.4:
Figure 4.4
Location of Cache Memory
Associative Mapping
The associative mapping method allows for full flexibility in the situation where a given MM slot
may be stored in cache. This mapping method uses a special kind of memory called an
associative memory
When the CPU generates an MM reference, the associative memory is first searched using the
CPU-generated address as the pattern. If a match is found, it means that this reference is a hit,
and the data part of the matched location is read. If the reference results in a miss, then the MM
is accessed using the CPU address. Clearly, if the required value is not found in the cache, the
associative memory must also be loaded with the address and MM contents for future cache
access.
Let's look at the example of associative mapping in figure 4.5. We will need an associative
memory that is 15 + 12 = 27 bits wide. This memory will store the memory addresses generated
by the CPU (15 bits) and the 12 bits of data stored at that MM address. The 15-bit memory
address is the tag for the associative mapping cache.
Figure 4.5
Associative Mapping Cache
|
←
CPU Address
↓
Address →
| ←
Data
01000
2345
02777
1670
23450
3456
:
:
:
:
:
:
:
:
→
|
All values are in octal.
When we include the valid bit, the cache word is 28 bits long and has this form:
Tag
Data
15 bits 12 bits
Valid Bit
1 bit
The associative cache described and shown above is the case when the block size is 1. If we
have 8 locations per block (i.e., a block size of 8), we will need a tag field of 12 bits (15 bits
minus the 3 bits that identify the block location) plus a word-within-the-block field of 3 bits. Each
slot in the associative CM must store the tag of 12 bits and the block field of 3 bits plus the
contents of 8 memory locations (i.e., 8 x 12 bits) and 1 valid bit (for the entire cache word).
Thus, this cache word is 112 bits long and has this form:
Tag
Block Data 1 Data 2 Data 3 - - - Data 8 Valid Bit
12 bits 3 bits
12 bits
12 bits
12 bits
12 bits
1 bit
When the CPU generates an MM reference, the associative memory is first searched using the
CPU-generated address as the pattern. If a match is found, it means that this reference is a hit,
and the data part of the matched location is read. If the reference results in a miss, then the MM
is accessed using the CPU address. The associative memory must also be loaded with the
address and the MM contents of the reference that caused the miss. If the associative memory is
full, we use one of the replacement policies discussed at the end of this section.
The advantage of associative mapping is that we have a lot of flexibility in determining which
cache slots to replace, and we can do something to keep frequently accessed slots in cache. The
disadvantage is the need for associative memories, which are expensive.
Direct Mapping
The direct mapping approach uses a random access memory where the CPU address is divided
into two fields, a tag and an index. If there are 2n words in main memory and 2k words in cache,
then the tag field is n – k bits, and the index field is k bits. The addressing relationship for the
32K x 12 memory and 512 x 12 cache is shown in figure 4.6 below.
Figure 4.6
Addressing Relationship Between MM and CM
Each MM location will map to a single specific cache location. In figure 4.7 below, every MM
address has 15 bits (shown as octal digits), which are split into two parts as follows:
1.
The lower-order 9 bits of an MM address, which are the index bits, are used to address
the CM.
2.
The remaining six bits, which are the tag bits, are used to uniquely identify which MM
address has been mapped into a cache slot.
Thus, the CM has the value in MM location 00000 (2120) loaded at index address 000 with tag
00. And the value at MM location 02777 (1670) is loaded at index address 777 with tag 02.
Figure 4.7
Direct Mapping Cache Storage
In the situation above, we considered only one byte of data per cache location. But if we have a
block size of 8, then the index is broken down into two parts:
Block Word
6 bits
3 bits
The advantages of direct mapping are its relative cost and simplicity. There are, however, a
number of disadvantages of direct mapping.
Set-Associative Mapping
Set-associative mapping has some of the simplicity of direct mapping. Each MM address is
allowed to map to a fixed set of cache "slots" (each slot can contain a data value). But a given
MM address can be placed in any one of the slots in the set to which it maps. If the size of each
set is k, then the mapping is said to be k-way set associative. You must keep clear the distinction
between cache set and cache slot. We can choose which cache slot within the set to replace, in
case there is not a match. Thus, we do have some flexibility.
The k-way set-associative mapping differs from both the associative mapping and direct mapping
caches in that, in addition to a valid bit field, there is a count bit field that keeps track of when
data was accessed and that is used for replacement purposes.
Each index word refers to k data words and their associated tags. The number of the count bits,
m, is the base-2 logarithm of k (2m = k). In addition, the tag is also longer by m bits and the
index is shorter by m bits because there are fewer cache words (requiring fewer bits to address),
and the shorter index requires a longer tag to adequately describe the full address in MM.
Using our example of a 32K memory x 12-bit data word, a two-way set-associative memory of
size 512 x 12 would have 256 cache words (512 / 2) for an index of 8 bits (256 = 28) and a tag
of 7 bits (15 – 8). Each set would have 7 tag bits, 12 data bits, 1 count bit, and 1 valid bit. The
format of the complete cache word would be:
Tag
Data
7 bits 12 bits
Count Bit Valid Bit
1 bit
1 bit
Way 1
←
Tag
Data
Count Bit Valid Bit
7 bits 12 bits
→ ←
1 bit
Way 2
1 bit
→
If we consider a four-way set-associative memory, we would have 128 cache words (512 / 4) for
an index of 7 bits (128 = 27) and a tag of 8 bits (15 – 7). Each set would have 8 tag bits, 12
data bits, 2 count bits, and 1 valid bit. The format of the complete cache word would be:
Tag
Data
Count Bits Valid Bit
8 bits 12 bits
1 bit
Way 1
←
Tag
2 bits
Data
←
2 bits
Way 3
Data
Tag
Data
1 bit
→
Count Bits Valid Bit
8 bits 12 bits
→ ←
2 bits
Way 2
→ ←
1 bit
Count Bits Valid Bit
8 bits 12 bits
Count Bits Valid Bit
8 bits 12 bits
Tag
2 bits
Way 4
1 bit
→
Replacement Policies
An important design issue for cache memory is which replacement policy to use. Most processors
use a least recently used (LRU) replacement policy or proprietary variants of LRU, but several
replacement policies are available, described below.
In the case of a miss, using the associative and set-associative mappings, we must have a
replacement policy so that a certain cache slot may be vacated. Some of the popular
replacement policies are:
1. FIFO (first-in first-out): The entire associative memory is treated as a "circular buffer,"
and slots are replaced in a round-robin order. Note that this policy makes no concessions
to the frequency of usage of an MM address.
2. LRU (least recently used): The slot that is replaced is the one that has not been used for
the longest time. The LRU policy is difficult to implement, but it does favor frequently
used addresses.
3. LFU (least frequently used): A count is kept of how many times a given cache location is
accessed in a fixed period of time. When a miss occurs, the slot that has the smallest
count is the one replaced. Note that a slot that was recently referred to for the first time
will have a small count value, and that it is therefore a candidate for exile to MM. For this
reason, all the frequency counters are periodically reset.
4. Random: One slot is randomly picked from the list of those eligible. Some studies
indicate that this policy is not far behind the best one in terms of performance (LRU), but
is easier to implement.
Write Methods
Another important design issue is that of how to write data to the cache. When a CPU instruction
modifies (i.e., writes a new value into) a cache location, the cache will contain a different value
than the corresponding MM location because, in the case of a hit, only the CM is accessed!
Whenever we have two memories (or, in general, two copies of any database), we will have
consistency problems if writes are not done simultaneously to both copies. You will study this
problem in more detail if you take a course on databases. Several write methods are available.
If we use the write-through method, each operation that writes a new value into cache must
simultaneously also write that value into the corresponding MM location to guarantee memory
integrity. The problem is that, if there are many writes, frequent accesses to MM must be made,
slowing everything down.
With the write-back method, every cache slot is associated with a single bit, called the dirty bit.
In this method, writes are made only to cache. When any location within that cache slot
(remember that a cache slot will, in general, contain an entire block of memory) is written, the
dirty bit for that slot is set. Thus, the dirty bit indicates that some location within that slot has
been contaminated. When a cache slot must be replaced, it is necessary to actually write the
slot back into memory only if the dirty bit has been set for that slot. At the conclusion of a
program, the cache contents must also be written out to MM for the final time.
C. Security and Memory Management
Nearly every modern computer is designed to support manipulating multiple programs on a
single CPU, a technique known as multitasking, or multiprogramming. The design of the
computer must support multiprogramming in such a way that malicious or inadvertent code
cannot shut down the computer. In order to meet this requirement, the computer hardware
should:
1.
limit any executing program to a specific portion of memory.
Protection is provided for storage access such that address spaces can be completely isolated
from each other. This keeps a program from accessing addresses being used by the system
or other programs.
2.
limit the set of instructions that a user's program can execute.
Multiple modes of protection, including a privileged or supervisory mode for control of the
system, and a user mode for application programmers, are provided. This protection keeps
application programmers from using instructions that are designed for system programmers.
Most instructions are unprivileged and available to application programmers.
3.
eliminate the programmer's concern about exactly where his or her program will load in
MM.
Program addresses are referred to as logical addresses, whereas actual memory addresses
are referred to as physical addresses. Logical addresses do not have any meaning outside the
program itself. Physical addresses are like LMC mailboxes. They have physical reality; that is,
they physically exist. Transforming from logical to physical addresses is known as mapping.
Memory management is performed by the memory management system, a collection of
hardware and software used for managing programs residing in memory. The hardware
associated with the memory management system is called the memory management unit, and is
located between the CPU and MM. The memory management software is usually part of the
operating system.
In a multiprogramming environment with many programs in physical memory, it is necessary to:
move programs and data around the MM
alter the amount of memory a specific program employs
prevent a program from inadvertently modifying other programs
A memory management unit may:
map logical addresses into physical addresses
enable memory sharing of common programs from different users
enforce security provisions of memory
Older computer systems used various means of partitioning a fixed-size memory to handle
programs. Specific memory management systems used included:
single-task systems. In a single-task system, the memory is allocated between the
operating system and the executing program, with some unused memory remaining.
Even if there is enough room in the unused portion of memory for additional programs,
they cannot be loaded because the operating system allows only one program in memory
at a time.
fixed-partition multiprogramming. A computer that runs multiple programs must
have a memory large enough for those programs to reside. One approach to managing
multiple programs in memory is to establish a set of fixed and immovable regions, or
partitions. One partition is reserved for the operating system and the other partitions are
available for user programs. The job of the operating system is then to decide which
partition any one program can occupy, and to decide in which queue to place a program
waiting for space in memory.
variable-partition multiprogramming. A more flexible approach to memory
management is variable-partition multiprogramming, where the operating system is
allowed to increase or decrease the size of partitions. As programs are moved into and
out of memory, holes are created. The holes are the wasted space not being used by
programs.
The fixed-partition multiprogramming and variable-partition multiprogramming memory schemes
have a common goal of keeping many processes (programs) in memory simultaneously to allow
multiprogramming. Both schemes, however, require the entire program to be in memory before
the process can execute. We say that these schemes require contiguous memory, meaning that
all parts of memory related to a given program are located adjacent to each other.
The problem is that very large programs may still require more memory than is available. Virtual
memory provides the solution to the problem of memory size by allowing the execution of
programs that are larger than the physical memory. Thus, where partitioned systems require
contiguous allocations of memory, virtual memory can work with noncontiguous allocations of
memory.
Another problem with both fixed-partition and variable-partition memories is the inefficient use
of memory, even if there are a lot of modest-sized programs. If there is enough free memory to
load a program, but the free memory is not contiguous, the operating system must pause some
programs and relocate them, modifying the current partition structure. Virtual memory solves
this problem by enabling modest-sized programs to be scattered about MM, using whatever
space is available.
A third problem with both partitioning schemes is that the final address in memory where a
program will be loaded for execution is not known. One solution to that problem is to use base
addressing, where the operating system uses a base register (discussed in section III-C of
module 3) to hold the location that corresponds to the starting location of the program.
The overall effect of the partitioning forms of memory management is that they lead to memory
fragmentation and other assorted inefficiencies.
D. Virtual Memory
The concept of virtual memory resolved the shortcomings of memory partitioning by
incorporating hardware into the memory-management unit to perform paging, the process of
dividing both the user address space and MM into pages of a fixed size, the "page size." Thus,
different pages of the program can be loaded into different pages of MM. Consecutive pages of
the user address space need not be consecutive in MM, and not all pages of user space need to
be present in MM at any time.
An example of virtual memory, also known as virtual storage, is shown in figure 4.8, where
memory used by the program (the logical organization) is stored in three different memory
locations and one disk drive location (the physical organization). This type of storage allows the
computer to load and execute a program that is larger than memory.
Figure 4.8
Logical Organization versus Physical Organization
In a paged system, each equal-sized logical block is called a page (like a page of a book), and
the corresponding physical block is called a frame. The size of a page is equal to the size of a
frame.
The size of the block is chosen in such a way that the bits of the memory address can be
naturally divided into a page number and an offset. The offset is a pointer to the location on a
page. A page table converts between the frame address and the page number.
An example of page translation from logical address 10A17 to physical address 3FF617 is shown
in figure 4.9. Both addresses are shown as hexadecimal numbers. Thus, the logical address is 20
bits with a 12-bit page number (10A) and an 8-bit offset (17). The physical address is 24 bits
with a 16-bit page number (3FF6) and the same 8-bit offset (17).
Figure 4.9
Page Translation in a Virtual Memory System
Let's consider how we could modify the LMC to support virtual memory. The LMC instruction set
that we discussed in section I-C of module 3 has a two-digit logical address space that allows for
100 two-digit addresses and 100 physical mailboxes. If the page (and frame) size were 10, there
would be 10 pages (from 0 to 9) and 10 frames (also from 0 to 9), and the offset for a particular
address would be from 0 to 9. In a page table, there would be a one-digit page number and a
one-digit offset. Thus, two-digit memory address 26 would become page 2 with an offset of 6, as
shown below:
PageNumber
2
Offset
6
If we allowed the LMC to physically expand to 1,000 addresses, but kept the two-digit address
allowed in the instruction set, a program would be still be limited to 100 logical addresses, but
those addresses could be spread over the 1,000 physical addresses. A table translation table
such as that shown in figure 4.10 below would be needed to convert from the logical space to the
physical space.
Figure 4.10
LMC Page Table with a Large Physical Space
Virtual memory can also be used to execute two programs that have the same code, but
different data. As shown in figure 4.11, the two programs use:
five 20-unit pages with logical addresses 1 to 100 for program code (e.g., instructions)
three 20-unit pages with logical locations 101 to 160 for data
The program code is stored in five 20-unit frames with physical addresses from 200 to 300 in
memory.
The data for the first program are stored in three 20-unit frames with physical addresses from
401 to 460, and the data for the second program are stored in three 20-unit frames with
physical addresses from 501 to 560 in memory.
Figure 4.11
Virtual Memory for Execution of Two Programs
Page Faults
Virtual memory simplifies the problem of memory management because finding memory
partitions large enough to fit a program contiguously is no longer necessary. But to execute an
instruction or to access data, it is necessary to have:
the instruction or data in memory
an entry in the page table that maps the logical address to the physical address
If both conditions above are not met, the CPU hardware causes a page fault. When a page fault
occurs, the memory management software selects a memory frame to be removed (swapped
out) and replaced (swapped in) with the needed page.
By having only those portions (i.e., referenced pages) of a program that are needed at a
particular time in MM, it is possible to "swap" needed portions (pages) in and out of MM at
different times to enable a large program to run in a small amount of MM space. Furthermore,
these pages can be placed in any location in MM available at that time, so the program can exist
in scattered (noncontiguous) locations. The page size is typically a hardware design feature of a
computer system.
The concept of paging is similar to that of CM in that it is hoped that memory accesses will be
found in pages already resident in MM, thus avoiding the time-consuming need to swap
information into and out of auxiliary storage devices.
Segmentation
Another approach to virtual memory, called segmentation, is one in which the blocks have
variable sizes. Unlike paging, segmentation gives the user control over the segment size, so it is
necessary to include the size of the variable-sized blocks in the translation table. Thus, the
translation table, called a segment table, includes the logical and physical starting addresses and
the block size for each segment. Figure 4.12 shows an example of a segmentation translation,
where logical address 217 translates to physical memory address 492 as an offset of 17 from
Block B.
Figure 4.12
Segmentation Translation in a Virtual Memory System
Segmentation is harder to operate and maintain than paging, and is falling out of favor as a
virtual-memory technique.
III. Performance Feature Comparisons
In this section, we will continue the discussions we started in module 3 on features of the X86,
PowerPC, and IBM Mainframe computer families to show the fundamental similarities between all
computer CPUs. We will discuss RISC versus CISC, pipelining, cache and virtual memory, and
some security issues.
A. X86
Although the early X86 CPUs had a simple design, the X86 architecture has evolved into a
sophisticated and powerful design with improved processing methods and features. Downward
software compatibility with earlier family members has been maintained so that each new model
has been capable of executing the software built for previous models.
The X86 CPU incorporates CISC processing with pipelining and superscalar processing, 2-level
cache memory, and virtual storage (which we called virtual memory). Floating-point instructions,
virtual storage, and multiprogramming support are now part of the basic architecture.
As a typical CISC processor, the X86 has relatively few general purpose registers and a relatively
large number of specialized instructions, along with a great variety of instruction formats.
CPU interrupts can be either emergency interrupts or normal interrupts, at one of thirty-two
prioritized levels from IRQ0 to IRQ31.
System security supports multiprogramming through the implementation of a protected mode for
addressing. In this mode, the CPU provides four levels of access to memory. Application
programs have the lowest level of access, and key portions of the operating system are on the
highest level of access.
Early versions of the X86 family required floating-point arithmetic to be performed in software or
in a separate processor. Later versions added the capability for floating-point arithmetic.
B. PowerPC
The PowerPC family of processors is developed around the RISC concept. The RISC architecture
is used in the IBM RS/6000 workstations, Apple Power Macintoshes, and Nintendo Gamecube
systems.
Floating-point arithmetic, memory caching, virtual memory, operating-system protection, and
superscalar processing is standard. As expected for a RISC design, every instruction is 32 bits
long. The uniform instruction length and a consistent set of op codes simplify the fetch and
execution pipelines and make superscalar processing practical and efficient.
There are two implementations: a 32-bit implementation with 32-bit registers, and a 64-bit
implementation with 64-bit registers. Both have 32 general-purpose registers and 32 floatingpoint registers to support the RISC design. Programs written for the 32-bit processors can be run
on the 64-bit processors.
Two levels of protection, supervisor (or privileged) and problem (or user), are provided for the
PowerPC.
C. IBM zSeries Mainframe
The zSeries is a CISC multiprocessing computer that can perform simultaneous computations
using pipelining and superscalar techniques found in other CPUs. Its capabilities include floatingpoint arithmetic, virtual storage (virtual memory), and two-level cache memory. The basic
building block of a z/System is the multichip module, which consists of either twelve or twenty
CPUs. Programs written for older IBM System 360/370/390 computers will execute on the
z/Series computers.
As expected in a mainframe computer, the z/Series computers provide excellent security. There
are two protection states, a problem state for application programs, and a supervisory state for
control of the system. Instructions in the problem state are not privileged to modify system
parameters.
Download