Functional Units - Edward L. Bosworth, Ph.D.,Textbooks and Other

advertisement
Functional Units of a Modern Computer
We begin this lecture by repeating a figure from a previous lecture.
Logically speaking a computer has four components.
Connecting the Components
Early schemes for connecting the components could be very messy.
The backplane of a PDP–10, circa 1970
Another Motherboard
Copper Traces on a PCB
These traces, which do the job of wires, are the real advantage of a PCB. All
modern busses are just collections of traces, aluminum or copper.
Types of Busses
There are a number of ways to classify computer busses.
Here we focus on one distinction: synchronous and asynchronous.
A bus transaction is an exchange of data or control information.
In simple transactions, a source asserts signals onto a bus, and a
destination copies the values into some sort of buffer memory.
More complex transactions might comprise a number of simple transactions.
A memory read transaction might include the following:
Assert the memory address on the address lines.
Assert READ on the control lines.
Memory places data on the data lines.
The distinction of simple vs. complex transactions is not important.
The term “transaction” covers a variety of possible exchanges.
A synchronous bus is one in which there is a clock signal, used to coordinate
bus transactions. Naturally, it has a line (or trace) dedicated to transmit it.
An asynchronous bus is one without a clock signal.
It often uses control signals, such as REQ and ACK, to coordinate transactions.
Clock Terminology
Clock frequencies are specified in Hertz, or cycles per second. Common
units include KHz (thousand cycles per second), MHz (million cycles per second),
and GHZ (billion cycles per second).
The inverse of clock frequency is clock period or clock cycle time.
Consider a 2 GHz clock. The frequency is 2.0109 per second, written
more precisely as 2.0109 sec–1.
The period is 1.0 / (2.0109 sec–1) = 0.510–9 sec = 0.5 nanosecond
= 500 picoseconds.
Clock signals are described with standard terminology.
The clock may be said to “tick” on every rising edge.
The Clock: System and Bus
A modern computer is a synchronous sequential machine.
A synchronous machine is one in which the actions are coordinated by a
central clock signal, either directly or indirectly.
A sequential machine is one that has memory and in which the results of
a computation depend on both the input and the contents of memory.
Each computer has one “master clock”, probably not called by that name.
All other clock signals are derived from this master clock.
We can have a frequency divider used to produce a slower clock. For example,
a 2 GHz (2000 MHz) clock signal can be converted by a divide–by–four
circuit into a 500 MHz clock signal.
There are also frequency doublers, used to produce a faster clock.
A 1 GHz clock signal can be doubled to produce a 2 GHz clock signal.
Bus clocks are often in the 150 MHz to 250 MHz range.
Bus Clock Signals
The bus clock frequency is usually a fraction of the system clock frequency.
In a typical example, a 2 GHz system clock can be passed through a divide–by–8
circuit to produce a 250 MHz bus clock signal.
System clock signals are generally represented as square waves, as we have done
in previous slides in this lecture.
Bus clock signals are commonly represented as trapezoidal waves, to emphasize
the fact that signal levels do not change instantaneously.
There is no abstract theory behind this representation; it is just common.
Signal Assertion Levels
A control signal is said to be asserted when it causes some bus action to take place.
A signal may be asserted high or asserted low.
The terms high and low refer to the voltages used to represent logical signals.
In the original TTL protocol, the following are standard.
5.0 volts
0.0 volts
logic 1
logic 0.
In TTL, a signal is asserted high when it is driven from 0 volts to 5 volts,
in order to achieve its affect.
In TTL, a signal is asserted low when it is driven from 5 volts to 0 volts,
in order to achieve its affect.
A control signal that is active high is simply named. X is active high.
There are several notations used to indicate that a signal is active low.
Here are several common notations.
Example: Memory Control Signals
Consider a memory unit that is controlled by the CPU.
There are two signals to the memory. One activates it and one indicates
the type of transaction commanded by the CPU.
SELECT#
This active–low signal selects the memory chip and activates
its circuitry. When high, the chip is inactive.
R/W#
This is a two–valued signal, indicating which of the two
possible transactions is commanded by the CPU.
The symbol with the #, W#, indicates the action commanded when the signal
is asserted low.
The symbol without the #, R, indicates the action when the signal is high.
If R/W# = 0, the CPU is writing to memory.
If R/W# = 1, the CPU is reading from memory.
Memory Control
The effect of the control signals on memory is summarized in this truth table.
Select#
1
1
0
0
R/W#
0
1
0
1
Action
Memory contents are not changed
or accessed. Nothing happens.
CPU writes data to the memory.
CPU reads data from the memory.
There is a standard shorthand for the above table.
Select#
1
0
0
R/W#
d
0
1
Action
Memory contents are not changed or accessed.
CPU writes data to the memory.
CPU reads data from the memory.
The “d” in the top row indicates that when Select# = 1, the value of the R/W#
signal has no affect on what the memory does. It stands for “don’t care”.
Synchronous Control Signals
Here is a depiction of two control signals, each of which is asserted low.
Signal A# is asserted during the second half of clock pulse T1.
Signal B# is asserted during the first half of clock pulse T1 and is
shown as asserted for the duration of this diagram.
Signal A# is somewhat unusual in that it is asserted at the same time as a
clock transition.
Often signals are asserted before a transition so that the signal is stable
at the time of the transition.
Remember the Garden Hose
At the time scales of human experience, propagation of electric signals
appears to be instantaneous.
In the world of CPU design, when timings are routinely expressed in tens
of picoseconds (1/100 of a nanosecond or 10–11 second), times to charge
a control line become significant.
Consider a bus with a length of 10 centimeters, about 4 inches.
The speed of light is about 30 centimeters per nanosecond.
The speed of signal transmission of a bus is about 20 centimeters per nanosecond.
The signal will take about 0.5 nanoseconds, or 500 picoseconds, to traverse the
length of the bus.
The signal on the bus is said to be stable after 500 picoseconds.
Collections of Signals
We now have notations for control signals.
What about other signals, such as addresses and data?
The interest in representing such non–control signals is not their
actual value, but whether or not they are valid.
For example, we do not want to know the actual values on address lines, but
whether or not there is a valid address being asserted.
The above notation might be used for a 32–bit address line.
A valid address is asserted on these lines in the middle of the first half of T1.
It remains valid until the middle of the first half of T2.
A Synchronous Bus Timing Diagram
This is a bus read. The sequence: the address becomes valid, RD# is asserted,
and later the data become valid.
An Asynchronous Bus Timing Diagram
Here, the importance is the interplay of the Master Synchronization (MSYN#)
and Slave Synchronization (SSYN#) signals.
The sequence:
1.
2.
The address becomes valid; MREQ# and RD# are asserted low.
MSYN# is asserted low, causing the memory to react.
3.
4.
Data become valid and SSYN# is asserted low.
When SSYN# goes high, the data are no longer valid.
More on Commercial Memory Modules
Here again is a picture of a commercial memory module.
Here is a picture of the slot into which three memory modules may be placed.
Inserting the DIMM
The next figure shows how to insert the Dual In–Line Memory Module.
The previous memory module had eight memory chips.
This module has eight memory chips and a control chip in the middle.
We shall discuss this organization of a module into multiple chips in a later lecture.
Connecting the CPU to the Motherboard
Early CPU chips in the Intel line, such as the 8088 and 80286 could be packaged as
DIP (dual in–line pin) modules. Here is a picture of the Intel 8086.
Later Pentium CPU chips required more pins, hence different packaging.
Cooling the Central Processor Unit
A modern CPU generates a lot of heat. This has given rise to an industry
devoted to building cooling devices. Here is a picture of one such device.
Note that it is the CPU attached to the top of this radiator device.
We do not see the fan.
The Fetch – Execute Cycle
Again we mention the fetch–execute cycle, which is the key feature of a
stored program computer.
The instruction is fetched from memory and then executed.
The cycle can be described differently, with more phases, as useful to support
discussions, but it is really the same.
Discussions of the MIPS–32 architecture divide the process into five phases:
1. Fetch instruction from memory.
2.
3.
4.
5.
Decode the instruction and read two registers.
Execute the operation or calculate an address.
Access an operand in data memory or write back a result.
For LW only, write the results of the memory read into a register.
LW (Load Word) loads a 32–bit register from memory.
This key cycle focuses on the IR (Instruction Register), which is a part of
the CU (Control Unit) of the CPU.
Any binary data copied into the IR are automatically interpreted as an instruction.
What Is Executed?
Each executable instruction is represented in memory as a binary number.
Your instructor prefers to use hexadecimal representation of these numbers,
as he finds it easier to read. Consider the following sequence.
B8
25
03
2B
EB
23
00
D8
C8
EE
01
8B
8B
2B
05
D8
CB
C0
This is represented in standard format, with four bytes (eight hexadecimal digits)
per line. At this point, we cannot interpret this sequence.
The sequence might represent four floating–point numbers, followed by a
16–bit integer. It might represent a strange sequence of character data.
If this sequence is read into the IR (Instruction Register), but not all at once, it
will be interpreted as a sequence of assembly language instructions.
The process of reverse engineering a sequence of absolute binary code into a
sequence of assembly language instructions is called disassembly.
The Disassembly of the Sample Code
Here is the disassembled object code.
B82301
052500
8BD8
03D8
8BCB
2BC8
2BC0
EBEE
MOV
ADD
MOV
ADD
MOV
SUB
SUB
JMP
AX,
AX,
BX,
BX,
CX,
CX,
AX,
100
0123
0025
AX
AX
AX
AX
AX
;
;
;
;
;
;
;
;
Move value 0x0123 to AX
Add value 0x0025 to AX
Copy contents of AX into BX
Add contents of BX to AX
Copy contents of AX into CX
Subtract AX from CX
Subtract AX from AX, clearing it
Go to address 100
We shall discuss disassembly a bit more in future lectures. Basically, this is the
main function of a good debugger.
Terminology:
The name “assembly language” refers to the sequence of human readable code
as seen in the column above. Assembly language instructions, called “mnemonics”
are supposed to be suggestive to the programmer.
The term “object code” refers to the sequence of binary numbers, as shown in the
left column and on the previous slide. Such code is often represented in hexadecimal
form for easier reading.
The Assembly–Link–Execute Cycle
Here is the common sequence for processing assembly language programs.
The process starts with a file in the form of readable text.
The assembler produces listing files (to be read by humans) as well as the object file,
containing object code, which is almost ready for execution.
The linker connects the object code of the user program with object code from any
external procedures used by that code. It produces a file ready for execution.
The loader places the executable file into memory and starts execution.
The process for higher level languages is similar, except for the fact that the source
code is said to be compiled. Compilers for IBM mainframe languages tend to emit
assembly language that is then assembled. Other compilers emit object code.
The Microsoft Common Language Infrastructure
Microsoft introduced its .NET architecture on February 13, 2002.
One of the goals of this architecture was to make it easier to call programs written
in one of Microsoft’s languages from those written in another.
One common option was to call C++ code from Visual Basic code.
The VB code would handle the user interface.
The C++ code would handle the computations.
In order to make the languages interoperable, certain standards were required.
One change gave rise to the CIL (Common Intermediate Language), originally
called MSIL (Microsoft Intermediate Language).
The process of producing an executable program in this architecture is a bit indirect.
1. The compiler converts the source code into a CIL file.
2. The CIL file is converted to a form called bytecode that is called a .NET assembly.
3. The .NET assembly is assembled into an object file and linked to form
an executable file. This is then executed.
There are many more features to the CLI (Common Language Infrastructure)
than are mentioned in this slide.
The Memory Component
Here are a few introductory remarks on memory. It will be studied in greater
detail in two future sets of lectures.
Modern computers tend to have memory that is byte addressable. A byte corresponds
to the size required to store the older character codes. It is a grouping of 8 binary bits.
The CPU has two registers dedicated to handling memory.
The MAR (Memory Address Register) holds the address being accessed.
The MBR (Memory Buffer Register) holds the data being written to the memory or
being read from the memory. This is sometimes called the Memory Data Register.
Modern memory must allow for addressable entities that have larger sizes. Common
lengths are 16, 32, and 64 bits, corresponding to 2, 4, and 8 bytes.
How do we address these larger items? Here, we assume standard memory alignments
for multiple byte items, as these yield more efficient programs.
The standard:
every byte has its own address.
16–bit ( 2 byte) words have addresses that are multiples of 2.
32–bit (4 byte) words have addresses that are multiples of 4.
Consider a 64 KB Memory
This is considered only because it allows for very easy illustrations.
64 K = 216 = 65,536.
Byte addresses run the range from 0 through 65,535.
16–bit word addresses: 0, 2, 4, …., 65,534.
32–bit word addresses: 0, 4, 8, …., 65,532.
The 16–bit word at address 65,534 contains the bytes at
addresses 65,534 and 65,535.
The 32–bit word at address 65,532 contains bytes at
addresses 65532, 65533, 65534, and 65535.
In general, a 16–bit word at address N comprises two bytes,
at address N and address N + 1.
In general, a 32–bit word at address N comprises four bytes,
at addresses N, N + 1, N + 2, and N + 3.
Which bytes go where? This has given rise to the discussion called
“Big–Endian vs. Little–Endian”.
Example: A 32–bit Value
Suppose that the 32–bit register EAX contains the value 0x01020304 which has
decimal representation 16, 909, 060. The value in EAX can be represented as:
This figure uses the bit numbering scheme common for Intel computers.
Suppose the instruction MOV Z, EAX is executed. What is placed into address Z?
This depends on whether the computer is a big–endian or little–endian device.
(The Pentium is a little–endian device, but we shall examine both options.)
The little–endian design stores the “little end” at address Z.
The big–endian design stores the “big end” at address Z.
The 32–Bit Example Continued
The value that goes into each address is a one–byte number, comprising
two hexadecimal digits. The hexadecimal digits are not reversed.
Address
Z
Z+1
Z+2
Z+3
Big-Endian
01
02
03
04
Little-Endian
04
03
02
01
Here is a graphic representation of the same process.
Example: “Core Dump” at Address 0x200
Consider this memory map.
Address
Contents
0x200
02
0x201
04
0x202
06
0x203
08
What is the 32–bit value stored at address 0x200?
Big–endian:
The number is 0x02040608. Its decimal value is
22563 + 42562 + 62561 + 81 = 33,818,120
Little–endian:
The number is 0x08060402. Its decimal value is
82563 + 62562 + 42561 + 21 = 134,611,970.
NOTE: Read the bytes backwards, not the hexadecimal digits.
What is the 16–bit value stored at address 0x200?
Big–endian:
The number is 0x0204. Its decimal value is
2256 + 4 = 516
Little–endian:
The number is 0x0402. Its decimal value is
4256 + 21 = 1,206.
The 16–bit value at 0x200 comprises bytes at addresses 0x200 and 0x201 only.
The values at 0x202 and 0x203 are not relevant.
I/O Ports
Computer memory is best considered as an array of addressable bytes.
I/O ports are addressed similarly, but are best considered as an array of
addressable registers, each of fixed width.
Each I/O device is connected to the computer bus through a number of registers
called I/O ports. There are three main types used.
Data register
For an input device, this contains data to be sent to the CPU.
For an output device, this contains data to be output.
Control register
This allows the CPU to control operation of the I/O device.
This register is best considered as a set of Boolean flags, each
controlling one option: can the device interrupt the CPU, etc.
Status register
This allows the CPU to detect the status of the device.
Is the power on? Is there a paper jam?
The control register is an output register; the CPU writes information to it.
The status register is an input register; the CPU reads status information from it.
Some simple devices combine the two into a Control & Status register.
Download