Chapter 5: Computer Systems Organization:

advertisement
Chapter 5: Computer Systems Organization:
Having looked at the building of circuits, it is now time to look at the computer as a
collection of these units that allow the processing of input data to get results (output
data). Inclusive would be the ability to store data as well.
Note that all of the functional units of a computer are built from gates and circuits. As
such, everything processed or/and stored in the computer are converted to its binary
equivalent because the gates/circuits only deal with binary. Because it requires millions
of gates/circuits to build a computer, to understand better how a computer works, we will
group these gates/circuits into subsystems (each subsystem consisting of millions of such
gates/circuits) and study the subsystems instead.
The Von Neumann Architecture:
Computers vary in size, memory capacity, speed and cost that are available in today’s
market – supercomputers, mainframes, minicomputers, workstations, laptops and tiny
hand held digital assistants. Regardless of all of these differences, almost every computer
in use today use have the same basic design (One difference is the Turing Machine).
This is known as the Von Neumann architecture who first proposed the architecture in
1946. The architecture is based on the three following characteristics:
1. A computer constructed from 4 major subsystems called memory, input/output, the
arithemetic/logic unit (ALU) and the control unit.
2. The stored program concept, in which the instructions to be executed by the
computer are represented as binary values and stored in memory.
3. The sequential execution of instructions. One instruction at a time is fetched from
memory to the control unit, where it is decoded and executed.
Memory and Cache:
All information stored in memory is in binary format. Memory is usually referred to as
RAM. Ram has the following characteristics including the fact that it is volatile:
1. Memory is divided into fixed sized unit called cells – each of which is associated
with a unique address (these addresses are unsigned integers 0, 1, 2, …).
2. All accesses to memory are to a specified address and a complete cell is always
fetched or stored (the cell is the minimum unit of access).
3. The time it takes to fetch or store a cell is the same.
ROM is nothing but a RAM with the ability to store information disabled.
1
The cell size or memory width is fixed i.e. a fixed number of bits make up a cell –
which is 8 bits or 1 byte, thus the largest unsigned integer that can be stored in one cell is
255 from 8 bits – 11111111. Hence, computers with a cell width of 8 bits use multiple
cells to store large numbers. Example, 2 or 4 bytes are used to store one whole number, 4
or 8 bytes to store one real number, 1 byte to store one character. However, to fetch an
whole number from memory, it will involve two (or 4) trips – one cell at a time.
The size of memory is dependent upon the amount of bits available to represent the
addresses of the cells. The maximum memory size is 2n where n is the number of binary
digits available to represent the address. Most computers today have at least 32 bits
available for addresses of memory (about 4 billion), with 1 gb or 2 gb being very
common. Note carefully the difference between the address and the contents of that
address.
Address
64
Contents
11111111
Of course the address value 64 could be converted to its binary equivalent depending if it
has a 16 bit addressing scheme or 24 bit addressing scheme, etc.
As noted earlier, the two basic operations on memory is fetching and storing.
Fetching means bring a copy of the contents of the memory cell with this particular
address. The original contents of the memory cell remain unchanged.
Storing means put the specified value into the cell with the given address. The previous
contents of this cell are lost.
The memory access time is the same for all 2n addresses which is currently
approximately less than 5 - 20 nanoseconds (1 nonosecond is 1 billionth of a second).
This is the time required to carry out one instruction – either a fetch or a store
instruction. Note:
I millisecond = 1/1000 second = 10-3 second (millisec or ms)
1 microsecond = 1/1000000 second = 10-6 second (µsec or µs)
1 nanosecond = 1/1000000000 second = 10-9 second (nsec or ns)
1 picosecond = 1/1000000000000 second = 10-12 second (psec or ps)
1 femtosecond = 1/1000000000000000 second = 10-15 second (fsec or fs)
1 attosecond = 10-18 second (asec or as)
1 zeptosecond = 10-21 second (zsec or zs)
2
1 yoctosecond = 10-24 second (ysec or ys)
MAR and MDR: (memory address register and memory data register)
These are the memory registers which are used to hold address and data respectively.
As indicated, an operation requires two pieces of information – the address of the cell and
the contents of the cell. The size of the MAR must be at least n bits wide – so it can hold
the maximum number of cells available. The size of MDR is usually a multiple of 8
where 8 bits is the typical size of a cell. This is so because for example, a whole number
requires 2 (or 4) bytes for storage, thus the MDR would have been at least 2 bytes in this
case. Typical sizes for MDR are 16 or 32 or 64 bits.
The operations are as follows:
Fetch (address)
1. Load the address into the MAR
2. Decode the address in the MAR
3. Copy the contents of that memory location into the MDR.
Store (address, value)
1. Load the address into the MAR
2. Load the value into the MDR
3. Decode the address in the MAR
4. Store the contents of the MDR into that memory location.
Decode address in MAR:
A decoder circuit could be used to decode the integer number in the MAR so that that
particular cell is referenced. A decoder circuit has n inputs and 2n outputs. Example if n =
3 then there would be 3 inputs and 8 outputs as the following example shows.
MAR n=3
Decoder
For each output line
000
001
010
011
100
101
110
111
A
B
C
3
0
1
2
3
4
5
6
7
This is a 3-8 decoder circuit (there are 3 inputs and 8 outputs). For example if MAR
contains a 3-bit address – say 101 (5), then only the 6th output line will be ON. All of the
other output lines will be OFF. Thus, such a decoder could be used to decode whatever
exist in MAR and hence obtain the address of the cell in question. Of course, with an
addressing scheme that consists of 16 bits, to use a decoder circuit of this nature, then
there must be 16 inputs and 216 outputs, a practically impractical circuit. Instead memory
is divided into a 2-dimensional structure with a row-major order. The desired memory
cell is obtained by taking the point where the row and column intersects. The book give
a good example on page 197 and page 199 (for the complete diagram).
The fetch/store controller:
We need to know whether if the operation is a fetch or store operation so the controller
will do this for us. The controller is like a traffic officer – it directs the movements
between the MDR and the memory. It receives an F-signal (from processor) when it is to
be a fetch operation and a S-signal when it is to be a store operation. As such, there ought
not to exist a ‘traffic jam’ and hence a very smooth operation will result.
Cache Memory.
Even though computers can run at very high speeds, there still exist bottleneck
situations. The processor would have to sit and wait for information to be brought from
memory. To solve this problem, the size of RAM could either be increased or another
form be found. The CACHE memory is born. The cache memory is much smaller than
RAM (often a size of 512 kb to maybe a few mega bytes) but it is about 10 times faster.
The principle on which the cache is built is the Principle of Locality, which is nothing
but the idea that if something is used now, there is a very good chance that it will be used
in the very near future. So, on this basis, information (instruction and data) that is
currently used is regarded as potential information to be used within the near future and
so is stored in cache as well as the original copy remaining in RAM.
Note not all data is kept in cache and if cache is full, the least used data (which is
often the oldest data) is removed to make room for the new data. The process used to get
information is as follows:
1. Look first in cache memory to see whether the information needed is there. If it is,
then access time would be much faster.
2.
If it is not in cache, then go through the process mentioned above – i.e. get the
information from RAM at the slower speed.
3.
Copy the data just fetched into the cache.
This scenario could reduce access time very much indeed. For example, if RAM access
time is 20 nsec and cache access rate is 5 nsec (which are quite realistic figures today)
4
and cache has a hit rate of 75% (again a realistic figure) then overall access time would
be
Average Access time of Cache = (.75 * 5) + (.25 * (5 + 20)) = 10 nsec
which is much less than if all information was in RAM with an access time of 20 nsec.
Input/output and Mass storage:
Mass storage devices such as tapes and disks enable us to store and retrieve data.
RAM looses all data after the power of the computer is turned off – not so mass storage
devices. These devices tend to differ according to manufacturers – but one common
principle prevail – the i/o access method and the i/o controllers. The i/o devices come
in basically two forms – those that representation information in human readable form
and the other that represent information in machine readable form. The former consist of
such devices as the keyboard, screen, and printers. The latter is usually the mass storage
devices such as disks and tapes. These are grouped into direct access storage devices
(DASDs) and sequential access storage devices (SASDs).
A direct access storage device is one in which the equal access time has been
eliminated. There is still a unique address for each cell but the time needed to access the
particular unit depends upon the physical location of the unit and the current state of the
device.
A disk stores information in units called sectors each of which contains the address
and the data block with a fixed number of characters.
A fixed number of these sectors are placed in concentric circles called tracks.
And the surface of a disk contains many tracks with a read/write head that can be
moved in and out so as to position itself over any track on the surface.
The access time of any sector on a track is made up of three components:
1. Seek time: the time needed to position the head over the correct track.
2. Latency: the time for the beginning sector to rotate under the head.
3. Transfer time: the time for the entire sector to pass under the head and have its
content read by the head or data written to the sector.
If we assume the following we can calculate the seek time, latency and transfer time:
Rotation Speed = 7200 rev/min = 1 rev/8.33 msec. (millisecond)
Arm movement time = .02 msec to move to an adjacent track.
Number of tracks per surface = 1000 (0..999)
Number of sectors per track = 64
Number of characters per sector = 1024.
5
1. Seek time:
Best case = 0 msec (no arm movement necessary)
Worst case = 999 x .02 = 19.98 msec (must move from track 0 to 999).
Average case = 400 x .02 = 8.00 msec (assume that on an average the head must
move about 400 tracks).
2. Latency:
Best case = 0 msec (sector is just about to come under head)
Worst case = 8.33 msec (just missed the sector so have to do one rev).
Average case = 4.17 msec (one-half rev)
3. Transfer time:
1/64 x 8.33 msec = .13 msec (the time for one sector to pass under the head)
Seek time
Latency
Transfer
Total
Best case
0
0
.13
.13
Worst case
19.98
8.33
.13
28.44
Average case
8.00
4.17
.13
12.3
As could be seen, the average access time is about 12 milliseconds or this could be even
less in some cases.
The other type of mass storage uses the sequential access method. In this case, there is no
addresses of the data and so the head will have to search the entire disk (in the worst
case) sequentially to find the required data. Most disks today are direct access in nature
which makes it faster than sequential access.
Sequential access is good when we want to copy an entire disk of data to another (back
up) medium.
Overall, i/o devices are very slow when compared to RAM – (RAM access time about 20
nsec whereas disk is about 12 msec a difference with a factor about 1,666,666 i.e. RAM
is about 1,666,666 times faster than disks).
If this was the entire picture, then often the processor would have to sit and wait for long
periods until access is completed, etc. – wasting time. The i/o controller is used to help
the situation. This component is used to handle i/o details and to help compensate for
time differences. It has a small amount of memory called an i/o buffer. It also consists of
enough i/o control and logic processing to handle i/o devices such as the head, paper feed
and screen display. It can also transmit to the processor a special hardware signal called
an interrupt signal.
6
For example, if a line of data is to be read from memory and to be displayed on screen,
the data would be read and placed in the i/o buffer at the very high speed associated with
access from RAM. The processor will now instruct the i/o controller to output the data to
the screen which will be much slower. But the processor will not sit and wait for the
process to be completed. It would have been freed to do something else while the i/o
controller output the data to the screen. When the i/o controller is finished, it will send an
interrupt message to the processor telling it that it is finished with its task.
Arithmetic/Logic Unit:
This part of the computer performs such operations as mathematical and logic operations.
The ALU and the control unit have become fully integrated as a single unit in all modern
computers called the processor.
The ALU is made up of three parts:
1. The registers.
2. The interconnections between components
3. The ALU circuitry.
Registers are storage cells that hold the operands and/or results of an arithmetic
expression. They have the following characteristics:
1. They do not have addresses but are accessed by a special register designator such as
A, X or R0.
2. They can be accessed much faster than RAM because they are fewer (typically a
computer may have 16 to 32 registers) compared with millions of addresses in RAM.
3. They are not used for general purpose storage but for specific purposes such as
holding operands for an upcoming arithmetic operation.
Why do we need Registers?
The main reason is that the data they contain can be accessed faster than RAM and even
Cache. Further, this means that the time spent to find the storage location of the data is
less. An example of the use of registers is as follows:
If we want to evaluate the following expression:
(a + b) x (c – d)
Then by storing the result from (a + b) into a register, rather than memory, when the time
is ready to access this part of the result, it would be much faster. This result would be
stored in a result register.
The operation is as follows: The operands are copied from memory to the register – the
left operands’ register connected to the left BUS and the right operands’ register
7
connected to the right BUS. After the operation is carried out the result is taken to a
register connected to the result BUS.
The operation to be carried out is determined by the operator, example +, =, <, >, etc. The
circuits to carry out these operations are built in the ALU using something like a
multiplexor. A multiplexor accepts many inputs (2n + n) and produces one output as
shown in Figures 5.11, 5.12 and 5.13 pps 209-211.
There are several different registers available in the ALU as the hand out shows. Also
note that registers have different names and different sizes according to the architecture
of the computer. Also note that the language syntax may differ for IBM architecture as
well. Some of the IBM architecture registers are:
The registers range from R0 to R15 to be used for data and/or instructions:
ADD … meaning addition.
MUL … meaning multiplication.
MOD …meaning modulo.
JMP … meaning unconditional jump.
JE … meaning jump if equal.
JNE …meaning jump if not equal.
JG … meaning jump if greater than.
INC … Increment register by 1.
STORE … Store the data into memory
SUB … meaning subtraction.
DIV… meaning division.
MOV… move data into a register or memory.
CMP … compare 1st value to 2nd value.
JGE … jump if greater or equal to.
JL … jump if less than.
JLE … jump if less than or equal.
DEC … Decrease register by 1.
LOAD … move data from memory to register
The Control Unit:
The Von Neumann architecture most important characteristic is the stored program
concept – a sequence of machine language instruction stored as binary values in memory.
It is the task of the control unit to:
1. Fetch from memory the next instruction to be executed.
2. Decode it – that is, determine what is to be done and
3. Execute it by issuing the appropriate command to the ALU, memory and I/O
controllers.
Machine Language Instructions:
These instructions are expressed in binary code in the following format:
Operation Code
Address Field 1
Address Field 2
…………
The operation code is a unique unsigned integer code assigned to each machine language
operation recognized by the hardware. If this code contains n bits, then the total number
of operations available is 2n operations.
8
An example: If Operation code is decimal 5 and address of field 1 is 45 with address of
field 2 being 70, then using 16-bits address size and 8-bits op code size, the following
will result:
00000101
0000000000101101
0000000001000110
And putting them together it would look like:
0000010100000000001011010000000001000110
op code
address 1
address 2
The address field is the memory addresses of the values on which these operations will
work. If the computer has a maximum capability of 128mb RAM (227) or a maximum of
a 27-bit addressing scheme, then this field must also be 27 bits wide (to accommodate the
maximum address that is possible). The amount of address-fields varies from 0 – 3
depending upon the operation – example an addition of two operands may require two
address fields or three (this last one is for the storing of the result).
The set of all instructions that can be executed by a computer is called the instruction
set. This instruction set differs according to computer manufacturer, especially between
IBM compatible computers and Macintosh. That is one of the main reasons why
programs written for IBM compatibles will not run on Mac. The current trend in
computer manufacturing is to make a small an instruction set as possible – RISC –
reduced instruction set computers (30-50 instructions) vs. CISC – complex instruction set
computers (200-400 instructions). In the case of RISC, more program instruction may be
required to complete the execution, but this is compensated for by the fact that the
processor is getting faster and faster. In fact, a RISC machine today can run a program
1.5 to 2 times faster than running that same program on a CISC machine.
Machine instructions can be grouped into 4 major categories:
1.
2.
3.
4.
Data Transfer.
Arithmetic.
Compare.
Branch.
Data Transfer:
These are operations that move information between or within different components of
the computer.
Could do the following operations when transferring information:
Memory cell
ALU register
Memory cell
ALU register
ALU register.
memory cell
memory cell
ALU register
9
An example of an instruction under data transfer:
Load x  load the register R with the contents of memory location x
Store x  Store the value found in register R into memory location x
Move x, y  Copy the contents from memory cell x into memory cell y.
Arithmetic:
This involves the mathematic operations of +, -, x, /, and, not and or. Could involve
operation between registers or/and values.
An example of an instruction under Arithmetic:
ADD x, y, z  Add the value in x to the value in y and store it in z. (a 3-address
instruction)
ADD x, y 
Add the value in x to the value in y and leave the result in y. (a 2-address
instruction)
ADD x 
Add the value in x to the value in register R and leave the result in R. (a 1address instruction)
Compare:
Uses the 6 logic operators for comparison - <, <=, =, >, >=, and <>.
When two values are compared, the result is either TRUE or FALSE translated into 1 or
0. A condition code which exist in the processor is set to reflect the result.
An example of an instruction under compare:
COMPARE x, y  Compare the value in x with that in y and set the condition code as:
Con(x) > Con(y)  GT = 1
Con(x) < Con(y)  LT = 1
Con(x) = Con(y)  EQ = 1
Branch:
This is generally a jump operation which is an instruction meaning to jump to an
indicated point. The Von Neumann architecture is build to execute instructions
sequentially by default. However, if there is a need to skip certain instructions, then the
jump instruction is used.
An example of an instruction under Branch:
JUMP x: An unconditional jump i.e you just go ahead and jump to this next instruction
JUMPGT x  If GT = 1, do the next instruction that is in Memory cell x otherwise, keep
10
going sequentially with the next instruction.
HALT:  Stop program execution.
The Control Unit Registers and Circuits:
The task of the control unit is to fetch and execute instructions. To do this it uses the
Program Counter (PC), Instruction Register (IR) and an Instruction Decoder
Circuit.
The PC holds the address of the next instruction.
The Control Unit sends the content of PC to MAR.
The fetch operation will then be carried out and the PC would be incremented by 1.
The IR holds the instruction fetched from memory.
The operation code is decoded (by instruction Decoder Circuit) to know what the
instruction is.
The full operation is then carried out depending upon the operation that is required.
This process is repeated until the program is fully executed.
Parallel Computing:
The Von Neumann Architecture allows us to carry out instructions sequentially.
Owing to the massive amount of data and size of programs, this can lead to the Von
Neumann Bottleneck (we are unable to process data fast enough). To prevent such
bottlenecks, other architectures are considered one of the most popular in today’s’ world
is parallel architecture and hence parallel computing. The idea behind parallel computing
is to be able to do more than one operation at the same time. To do so, we have to have
the hardware.
One solution is to incorporate multiple processors in the same “machine” i.e. make the
computer with dual or quad, etc. processors and when executing a program, it is divided
into sub-parts with each part being handled by each of the processors simultaneously. So
if the program as a whole took 1 minute to execute on a single processor, by having a
quad-processor machine, it can conceivably take ¼ minutes (15 seconds)
Another solution is to have a network of single (or even multiple) processor machines
all connected together and broadcast the instruction to all of the machines along with
distributing the data (SIMD – single instruction stream/multiple data stream). In this way,
all of the machines will do the single instruction but on their own data set as the
following diagram shows:
11
A very good application for this type of architecture is vector manipulation – adding two
vectors of scaling a vector, etc. For example, if we had a vector with 4 numbers and we
wanted to scale by a factor of 3, we give the 4 ALU one data each and all 4 ALU’s would
have the instruction to multiply their data by 3. This operation would have been done
simultaneously, hence speed up the time required. This type of parallel computing is
known as distributed processing.
Another method is we distribute both multiple instructions and multiple to each
computer (MIMD – multiple instruction stream/multiple data stream). Here, the program
is divided into separate processes and each individual process is allocated to an individual
machine. Each of these processes may be made up of many instructions and may require
multiple data. The diagram on the next page shows this type of architecture. This type of
distribution is also known as cluster computing. Note: one variation of the MIMD
architecture is Grid Computing whereby all of the processors don’t have to be located in
the same “building” but can be located anywhere where communication is possible for
networking. So computers from all parts of the country can be connected to solve a
problem – ex. SETI (Search for Extraterrestrial Intelligence) at Berkeley where
individuals have rented their computer whilst they are at work.
12
13
Download