Higher Computing - Shawlands Academy

advertisement
Advanced Higher
Computing
Computer
Architecture
Chapter
2
The internal architecture of the microprocessor
The memory address register (MAR)
Data transferred to the processor from anywhere else in
the system, arrives in the MDR.
Data which is to be sent out from the processor, is sent
out from the MDR along the data bus.
The MDR forms a tiny buffer between the internal bus
and the data bus, so it is also known as the memory
buffer register
The memory data register (MDR) - sometimes
called the memory buffer register (MBR)
Addresses never arrive in the MAR from outside the
processor. The address bus is a one-way bus, unlike the
data bus which can carry data in either direction.
The data that is being fetched from memory will be a
machine code instruction. This arrives in the MDR like
any other item of data. It is then transferred
The instruction register (IR)
Data is held while being decoded by the control unit.
At any instant, the IR holds the machine code instruction
which is currently being decoded and executed.
A machine code program consists of a series of machine
code instructions, held in main memory. These are
fetched one by one from memory.
The program counter (PC)
How does the processor know where to find the next
instruction to be processed?
the program counter holds the address of the next
Instruction.
In addition, there will be many general purpose
registers (GP registers), which, as their name implies,
can be used to store any item of data at any time as
required by the current program running in the processor.
Pupil task
Complete questions 10-14 on pages 30 & 31
To execute a machine code
program it must first be
loaded, together with any
data that it needs, into main
memory (RAM). Once loaded,
it is accessible to the CPU
which fetches one instruction
at a time, decodes and
executes it.
Fetch, decode and execute
are repeated until a program
instruction to HALT is
encountered. This is known
as the fetch-execute cycle.
1. Fetch. The instruction is fetched from the memory
location whose address is contained in the Program
Counter and placed in the Instruction Register. The
instruction will consist of an operation code and possibly
some operands. The operation code determines which
operation is carried out. The term opcode is usually used
as a shorthand for operation code.
2. Decode. The pattern of the opcode is interpreted by
the Control Unit and the appropriate actions are taken by
the electronic circuitry of the CU. These actions may
include the fetching of operands for the instruction from
memory or from the general purpose registers.
3. Increment. The Program Counter is incremented. The
size of the increment will depend upon the length of the
instruction in the IR. For example, if this instruction was
a 3 byte instruction then the PC would be incremented by
3.
4. Execute. The instruction in the Instruction Register is
executed. This may lead to operands being sent to the
ALU for arithmetic processing and the return of the result
to one of the general purpose registers. When a HALT
instruction is received then execution of this program
ceases.
The fetch phase
1. The contents of the PC are copied into the MAR;
2. The contents of memory at the location designated
by the MAR are copied into the MDR;
3. The PC is incremented;
4. The contents of the MDR are copied into the IR.
The execute phase
The execute phase consists of the following steps:
1. Decode the instruction in the IR;
2. Execute the instruction in the IR.
For convenience we can write this series of steps as a
pseudocode representation:
loop forever
PC > MAR
[MAR] > MDR
PC +1 > PC
MDR > IR
Decode IR
Execute IR
End loop
Note that means is copied to and that means the
contents of the location pointed to by .
Pupil task
Web animation on Scholar for:-
Use of Registers in an Instruction Fetch
&
Sequencing the steps in an instruction fetch
Improving performance
Computer and microprocessor designers are driven by the
need to improve computer performance to meet the ever
increasing demands of computer users.
Early microprocessors had clock speeds measured in kHz
(thousands of cycles per second) while modern
processors such as the Pentium III are now achieving
speeds of over 1 GHz (thousand million cycles per
second). Obviously clock speed is an important factor in
determining
the clock speed versus the performance of Intel
processors as measured in Million Instructions per
Second (MIPS). MIPS is now an outdated way to measure
performance but it is the only measure applicable over
the whole range.
Intel Processor
8086
80286
80386DX
80486DX
Pentium
Pentium Pro
Clock Speed
8 MHz
12.5 MHz
20 MHz
25 MHz
60 MHz
200 MHz
MIPS
0.8
2.7
6.0
20
100
440
This table shows that the performance as measured by
MIPS has gone up at a higher rate than has the clock rate
Pupil task (30mins)
Complete the table below. Then predict what clock speeds
You would expect to be available in the next 5 years
Intel Processor
Pentium 2 (1997)
Pentium 3 (1999)
Pentium 4 (2000)
Itanium (2001)
Pentium M (2003)
Clock Speed
MIPS
Go to www.intel.com/pressroom/kits/quckreffam.htm
or
www.intel.com and enter quckreffam into the search box
Increasing data bus width
Increasing the clock speed will increase the number
of data fetches that can be made per second.
Increasing the data bus width will increase the amount
of data that can be fetched each time
A data bus width of only 4 bits took 2 fetches to
fetch a byte from memory to the processor. The Intel
8008 processor (1972) used an 8 bit data bus. Clearly
the internal registers (particularly the MDR) had to
match this
Intel 8086 was developed which used a 16 bit data bus
and set of internal registers. This gave huge
improvements in performance, and allowed the
development of the first PCs
In 1985, Intel decided to increase the data bus width and
internal registers of its processor again, so the 80386 was
produced with a 32 bit data bus.
32 bits was the norm for the next 10 years, until the
first 64 bit Pentium chip was introduced in 1995.
All PC designs since then have made use of 64 bit techn.
NOTE: A similar development has taken place in the
Motorola chips which are used in Apple computers, from
the early 68000 16-bit architecture through to the current
G5 64-bit architecture
the width of the address bus has no direct effect on
performance
The width of the address bus determines the maximum
memory address
address bus widths have also increased steadily over the
last few decades from 16-bit to 32-bit, and now 64-bit.
The earliest computers had a single system bus
connecting the processor with the main memory and
peripheral interfaces. This system bus operated at the
same speed as the processor.
the data bus width has been stepped up from 8 to 16,
32 and now 64 bits wide.
The number of different components within a system has
also increased. A modern processor is likely to be
connected to a range of peripherals as well as main
memory.
These peripherals operate at lower speeds than the
processor and main memory. As a result, designs have
developed with multiple buses within the system
a very fast "frontside" system bus for main memory,
a slower bus for communication with peripheral devices
The PCI and PCI-X buses are connected to the main
system bus by a bus bridge, which controls the traffic on
and off the bus
The PCI and PCI-X buses are known as multipoint buses;
(this means they can have branches to any number of
Devices)
The previous diagram shows how components interact
The separation of (relatively slow) peripheral traffic on
to the PCI bus means that fast data transfers between
main memory and the processor are not slowed down.
The PCI-X bus, as well as being faster and wider than
the original PCI bus, also has a number of special
features to maximise performance.
These include the prioritisation of data from
different devices, which particularly improves
performance of streaming audio and video.
Cache Memory
The speeds quoted for data access to main memory
sound quite impressive. However, current processors are
able to process data even faster than that!
One solution to this problem would be to increase the
number of registers on the microprocessor itself, so
that all data required would be instantaneously available
to the processor.
However, this solution is impractical, leading to over
complex and large microprocessor chips
Cache memory uses the faster but
more expensive static RAM chips
rather than the less expensive, but
slower, dynamic RAM chips which
are used for most of the main
memory
Cache memory is connected to the processor by the
"backside" bus
Normally whole blocks of data are transferred from main
memory into cache, while single words are transferred
along the backside bus from the cache to the processor.
L1 and L2 cache
Most modern chips also have level 1 (L1) cache. This is
similar to L2 cache, but the cache is actually on the same
chip as the processor. This means that it is even faster to
access than L2 cache.
Pentium processors have two L1 caches on the processor.
One of these is for caching data, while the other is use for
caching instructions (machine code).
In the Pentium 4 processor, each of these is 8Kbytes.
Similarly, the PowerPC G4 processor has two 32Kb L1
caches.
As we have seen, memory access is one of the major
bottlenecks limiting performance of computer systems.
Many techniques have been devised to overcome this,
including use of SRAM, widening the data bus, using
separate buses for memory and peripherals,
and the use of L1 and L2 cache.
Another technique which can be applied is called
interleaving.
Memory interleaving
The idea behind interleaving is that memory can be split up
into 2 or 4 independent RAM chips. A memory read or write
will normally take 3 or 4 clock cycles to perform.
Data is actually being transferred along the data bus during
only 1 of these clock cycles.
The processor has to insert "wait" states into its program to
allow for this. If memory interleaving has been implemented,
the processor can use a "wait" state to initiate the next
memory access, so saving time.
In effect, the processor can access the 4 memory chips almost
simultaneously. This increases throughput significantly. To
make best use of this, successive data items must be stored in
different memory chips.
Memory interleaving is tricky to implement for memory
fetches, as the processor has to deal with the data that
arrives, which may require further processing steps.
It is ,therefore, more often used for memory writes,
where the processor simply sends the data off to
memory, and does not have to "worry" about what
happens next.
For a similar reason, memory interleaving is used to
speed up access to video RAM.
This is less problematic than main memory, as all the
"data" is simply data, whereas in main memory, the
"data" may be instructions!
Direct Memory Access (DMA)
direct memory access (DMA), which is used when
data is being transferred to or from a peripheral
device.
There are two methods commonly used to
transfer data without DMA –
• programmed I/O,
• interrupt-driven I/O
The inefficiencies of programmed and interrupt driven I/O
are not too serious under most circumstances, but
become a serious issue when large blocks of data are to
be transferred between main memory and a slow
peripheral.
DMA is a technique which overcomes this.
Diagram of DMA – DMAC can be required For exam
papers?
Pupil task
Complete questions 18 – 25 on pages 42 & 43
Download