View

advertisement
Chapter 4: A Simple Computer
• We explore the organization of a computer which is
– the structure and function of the components in the
computer
• Structure: how the components in the computer are
connected together, how they communicate
• Function: what they do
• Specifically, we will explore the
–
–
–
–
–
CPU
Memory
I/O
Bus
A primitive instruction set (MARIE)
• MARIE is the book’s simple computer
• We will examine it to understand what an instruction set
is, before we being our examination of the instruction set
for Intel
The CPU
• The central processing unit (or processor)
– Is the brain of the computer
– It does all the processing
• What does this mean?
– The CPU is in charge of executing the current program
• Each program is stored in memory along with data
• The CPU is in charge of retrieving the next instruction from
memory (fetch), decoding and executing it
• Execution usually requires the use of ALU and temporary
storage in registers
• Some instructions cause data movement (memory accesses,
input, output) and some instructions change what the next
instruction is (branches)
– We divide the CPU into two areas
• Datapath – registers and ALU (the execution unit)
• Control unit – circuits in charge of performing the fetch-execute
cycle
Two Kinds of Registers
User registers
• These store data and
addresses
– These are manipulated by your
program instructions
• Example: Add R1, R2, R3
– R1  R2 + R3
• Computers will have between
– One and hundreds of registers
– Possibly divided into data and
address registers
• Registers are usually the size
of the computer’s word size
– 32 or 64 bits today, previously it
had been 8 or 16 bits
Control registers
• Registers that store
information used by the
control unit to perform the
fetch-execute cycle
– PC – the memory location of
the next instruction
– IR –the current instruction
being executed
– Status flags – information
about the results of the last
instruction executed (was
there an overflow, was the
result positive, zero or
negative?)
ALU
• Consists of circuits to perform arithmetic and logic
operations
–
–
–
–
–
Adder
Multiplier
Shifter
Comparator
Operations in the ALU set status flags (carry, overflow, positive,
zero, negative)
• Also, possibly, temporary registers before moving results
back to register or memory
Control Unit
• in charge of managing the fetch-execute cycle
– It sends out control signals to all other devices
– A control signal indicates that the device should
activate or perform it’s function
– For instance:
• Instruction fetching requires
–
–
–
–
sending the PC value to main memory
signaling memory to read
when the datum comes back from memory, move it to the IR
increment the PC to point to the next instruction
• These operations are controlled by the control unit
• Now the control unit decodes the instruction signals the
proper ALU circuit(s) to execute it
The System Clock
• In order to regulate when the CU issues its control
signals, computers use a system clock
– At each clock pulse, the control unit goes on to the next
task
• Register values are loaded or stored at the beginning of a clock
pulse
• ALU circuits activate at the beginning of a clock pulse
The System Clock
• Clock performance is based on the number of pulses
per second, or its Gigahertz rating
– This is a misleading spec
– The number of clock pulses (cycles) that it takes to execute
one instruction differs from one computer to the next
• Assume computer A takes 10 clock cycles per instruction but has
a 1 Gigahertz clock speed
• Assume computer B can execute 10 instructions in 11 cycles
using a pipeline, but has a 250 Megahertz clock speed
• Which one is faster? B even though its clock is slower!
Comparing Clocks
• It is difficult to computer CPU performance just by
comparing clock speed
– You must also consider how many clock cycles it takes
to execute 1 instruction
– How fast memory is
– How fast the bus is
• In addition, there are different clocks in the
computer, the Control Unit and the whole CPU are
governed by the system clock
– There is usually a bus clock as well to regulate the usage
of the slower buses
The Bus
 A bus is a collection of wires that allow current to flow over
them
 The current is the information being passed between
components
 There are 3 parts to a bus
– data bus
• for data and program instructions
– control bus
• control signals from the CU to the devices, and feedback lines
for ack that they are ready or for interrupting the CPU
– address bus
– the address of the memory location or I/O device that is to
perform the given operation
The Bus
Additionally, computers may have multiple
buses
Local bus
 connects registers, ALU and CU together
System bus
connects CPU to main memory
Expansion or I/O bus
connects System bus to I/O devices
More on Buses
• Buses connect two types of devices
– Masters
• Devices that can initiate requests
– CPU
– some I/O devices
– Slaves
• Devices that only respond to requests from
masters
– Memory
– some I/O devices
More on Buses
• Some buses are dedicated
Point-to-point
Buses
– The bus directly connects two
devices (point-to-point bus)
• Most buses connect multiple
components
Multipoint
network
– multipoint
Multipoint
Expansion bus
The System Bus
• Main memory connects to this bus through pins
• The I/O subsystem connects to this bus through the expansion
bus
• The bus carries three types of information
– The address from the CPU of the intended item to be accessed
– The control information (read versus write, or status information like
“are you available?”)
– The data, either being sent to the device, or from the device to CPU
expansion bus
• The expansion bus
– is the collection of expansion slots and what gets
plugged into them
– Here we see interface cards (or expansion cards), each
with the logic to interface between the CPU and the I/O
device (e.g., printer, MODEM, disk drive)
Who gets to use the bus?
• In a point-to-point buses this is not a problem
• In the expansion bus where multiple I/O devices may want to communicate
between themselves and the CPU or memory at the same time – we need a
form of Bus Arbitration
– Daisy chain arbitration
• Each device has a bus request line on the control bus
• When a device wants to use the bus, it places its request and the highest
priority device is selected (this is an unfair approach)
– Centralized parallel arbitration
• The bus itself contains an arbiter (a processor) that decides
• The arbiter might become a bottleneck, and this is also slightly more
expensive
– Distributed arbitration
• Devices themselves to determine who gets to use the bus, usually
based on a priority scheme, possibly unfair
– Distributed arbitration using collision detection
• It’s a free-for-all, but if a device detects that another device is using the
bus, this device waits a short amount of time before trying again
I/O Subsystem
• There are many different types of I/O devices, collectively
known as the I/O Subsystem
• Since I/O devices can vary in their speed and usage,
the CPU does not directly control these devices
• Instead, I/O modules, or interfaces, take the CPU
commands and pass them on to their I/O devices
How to communicate with the right I/O device?
• To communicate to the right I/O device, the CPU
addresses the device through one of two forms
1. Memory-mapped I/O
2. Isolated I/O
Memory-mapped I/O
 the interface has its own memory which are addressed as if they
were part of main memory, so that some memory locations are not used,
they are instead registers in the I/O interfaces
 So each of these is given its own address
 these addresses overlap those of memory so that a request issued to
one of these memory addresses is actually a request of I/O, not memory
and memory ignores the request. In such a system, the addresses are
the earliest (say the first 5000 addresses).
Isolated I/O
– In isolated I/O, the 5000 or so addresses are
separate from memory, so that we need an extra
control line to indicate if the address is a memory
address or an I/O address. In memory-mapped I/O,
the early addresses are shared so that, if one of
these addresses is sent out, memory ignores it
Memory Organization
Memory is organized into byte or word-sized blocks
– Each block has a unique address
• This can be envisioned as an array of cells
– The CPU accesses memory by sending an address of
the intended access and a control command to read or
write
– The memory module then responds to the request
appropriately
• A decoder is used to decode the binary address into a
specific memory location
Dividing Memory Across Chips
• Main memory usually consists of more than one RAM chip
• Each memory chip can store a certain amount of information
– However, architects decide how memory is spread across these
chips
– For instance, do we want to have an entire byte on a single chip,
or spread a byte across 2 or more chips?
– Here, a word (16 bits) is stored in two chips in the same row
Interleaving Memory
•
Main memory usually consists of more than one RAM chip
–
•
Hence if you buy memory to upgrade you buy a Memory Module
•
Access is more efficient when memory is organized into banks of chips with the addresses
interleaved across the chips
in high-order interleaving, the high order address bits specify the memory bank/module
•
Using high-order interleave
– The advantage of high-order interleave is that two different devices, working on two different
areas of memory, can perform their memory accesses simultaneously
– e.g., one device accesses address 5 and another accesses 31
•
low-order interleave
– the low order bits of the address specify which memory bank contains the address of
interest
– Consecutive memory locations are on consecutive chips
– The advantage of lower-order interleave is that several consecutive memory accesses can be
performed simultaneously
– For instance, fetching 4 consecutive instructions at one time
Interrupts
• CPU performs the fetch-execute cycle on your
program repeatedly without pause, until the
program terminates
• What happens if an I/O device needs attention?
• What happens if your program tries to do an illegal operation?
• What happens if you want to run 2 or more programs in a
multitasking mode?
– You cannot do this without interrupts
• An interrupt
– Is the interruption of the CPU so that it can switch its
attention from your program to something else (an I/O
device, the operating system)
The Interrupt Process
• At the end of each fetch-execute cycle, the CPU
checks to see if an interrupt has arisen
– Devices send interrupts to the CPU over the control bus
• If the instruction causes an interrupt, the Interrupt Flag (in the
status flags) is set
– If an interrupt has arisen, the interrupt is handled as follows
• The CPU saves what it was doing (PC and other important
registers are saved to the run-time stack in memory)
• The CPU figures out who raised the interrupt and executes an
interrupt handler to handle that type of interrupt
– interrupt handler is a set of code, stored in memory
• Once the interrupt has been handled, the CPU restores the
interrupted program by retrieving the values from the runtime stack
A Simple Computer
• We now put all of these elements together into a
reduced computer (MARIE)
• Machine Architecture that is Really Intuitive & Easy
• MARIE is too easy, it is not very realistic, so we will go
beyond MARIE as well
• We will explore MARIE’s
– CPU (registers, ALU, structure)
– Instruction set (the instructions, their format – how you
specify the instruction, addressing modes used, data types
available
– Interrupts, I/O
– Some simple programs in MARIE
MARIE’s Architecture
• Data stored in binary, two’s complement
• Stored programs  stores program data and instructions in
same memory
• 16-bit word size with word addressing (you can only get
words from memory, not bytes)
• 4K of main memory using 12 bit addresses, 16-bit data
MARIE’s Architecture
• 16-bit instructions (4 bits for the op code, 12 bits for the
address of the datum in memory)
MARIE’s Architecture
• Registers:
– AC
• this is the only data register
(16 bits)
–
–
–
–
PC (12 bits)
IR (16 bits)
Status flags
MAR
– stores the address to be sent
to memory, 12 bits
– MBR
• stores the datum to be sent to
memory or retrieved from
memory, 16 bits
– 8-bit input and 8-bit output
registers
MARIE CPU
• The structure of our CPU with the registers shown
• MAR sends to memory, the MBR stores the data being sent to memory or
retrieved from memory
• InREG and OutREG receive data from and send data to I/O respectively
MARIE’s interconnection
• The registers are interconnected, and
connected with main memory through a
common data bus.
• Each device on the bus is identified by a
unique number that is set on the control lines
whenever that device is required to carry out
an operation.
• Separate connections are also provided
between the accumulator and the memory
buffer register, and the ALU and the
accumulator and memory buffer register.
• This permits data transfer between these
devices without use of the main data bus.
MARIE’s Fetch-Execute Cycle
PC stores the location in memory of the next Instruction
1) fetch instruction by sending the address to memory (PC to MAR to memory)
2) memory sends back instruction over data bus, to MBR, move it to IR, increment PC
3) Decode the instruction (look at op code, place 8-bit data address in MAR if needed
4) If operand (Memory value) required, fetch it from memory
operand is the part of a computer instruction which specifies what data is to be manipulated or operated on. A computer
instruction describes an operation such as add or multiply X, while the operand specify on which X to operate as well as the
value of X
5) Execute instruction
6) If necessary, process interrupt
MARIE’s ISA
• A computer’s instruction set architecture specifies
the format of its instructions and the primitive
operations that the machine can perform.
• The ISA is an interface between a computer’s
hardware and its software.
• Some ISAs include hundreds of different
instructions for processing data and controlling
program execution.
• The MARIE ISA consists of only 13 instructions.
MARIE’s Instructions
• This is the format
of a MARIE instruction:
• The fundamental MARIE instructions are:
MARIE has a major flaw in that all data must be stored in
memory and the Memory addresses known at compile time
What about using pointers?
We have to add instructions that have indirect access to memory to
allow for pointers, so we add AddI and JumpI
We also add Clear to clear the accumulator
We also add JnS to permit procedure calls (jump but also
save the PC so we can return when done with the procedure)
MARIE’s Instructions
• This is a bit pattern for a LOAD instruction as it
would appear in the IR
• We see that the
– opcode is 1
– address from which to load the data is 3
MARIE’s micro-operations
• Each of our instructions actually consists of a sequence
of smaller instructions called microoperations.
• The exact sequence of microoperations that are
carried out by an instruction can be specified using
register transfer language (RTL).
• In the MARIE RTL, we use the notation
– M[X] to indicate the actual data value stored in memory
– Location X
–  to indicate the transfer of bytes to a register or memory
location.
Example of microoperations
• The RTL for the LOAD instruction is
MAR  X
MBR  M[MAR], AC  MBR
Cycle 1
Cycle 2
• The RTL for the ADD instruction is
MAR  X
MBR  M[MAR]
AC  AC + MBR
Cycle 1
Cycle 2
Cycle 3
Example: Add 2 Numbers
This code will add the two numbers stored at memory location 104 and 105
Load 104 loads the AC with the value at 104 (0023)
Add 105 adds to the AC the value at 105 (FFE9)
Store 106 takes the value in the AC (000C) and moves it to location 106
Halt then stops the program
Look at contents of registers as program executes
4.6 Extending Our Instruction Set
• So far, all of the MARIE instructions that we have
discussed use a direct addressing mode.
– This means that the address of the operand is explicitly
stated in the instruction.
• It is often useful to employ a indirect addressing,
where the address of the address of the operand is
given in the instruction.
– If you have ever used pointers in a program, you are
already familiar with indirect addressing.
ADDI X instruction
• Add indirect specifies the address of the address of the operand
– Use the value at location X as the actual address of the data operand
to add to AC
• The following RTL tells us what is happening at the register level
MAR  X
MBR  M[MAR]
MAR  MBR
MBR  M[MAR]
AC  AC + MBR
Example
AC = 0020
ADDI 500
Result AC = 1021
500
600
022
0022
1100
1001
JnS X
• Another helpful programming tool is the use of subroutines.
• The jump-and-store instruction, JNS, gives us limited
subroutine functionality.
• Store the PC at address X, and jump to address X+1
• The details of the JNS instruction are given by the following
RTL
MBR  PC
MAR  X
M[MAR]  MBR
MBR  X
AC  1
AC  AC + MBR
PC  AC
Does JNS permit recursive
calls? ‫النداءات التكرارية‬
SKIPCOND
• Skips the next instruction according to the value of the AC
• The RTL for the this instruction is the most complex in our
instruction set
If IR[11 - 10] = 00 then
If AC < 0 then PC  PC + 1
else If IR[11 - 10] = 01 then
If AC = 0 then PC  PC + 1
else If IR[11 - 10] = 10 then
If AC > 0 then PC  PC + 1
Bit pattern for SKIPCOND instruction
• This is a bit pattern for a SKIPCOND instruction as it
would appear in the IR
• We see that the opcode is 8 and bits 11 and 10 are 10,
meaning that the next instruction will be skipped if the
value in the AC is greater than zero.
Assemblers and Assembly Language
• Compare the machine code to the assembly code
– You will find the assembly code much easier to decipher
• Mnemonics instead of op codes
• Variable names instead of memory locations
• Labels (for branches) instead of memory locations
– Assembly is an intermediate language between the
instruction set (machine language) and the high-level
language
– The assembler is a program that takes an assembly
language program and assembles it into machine
language, much like the compiler compiles a high-level
language program
Discussion on Assemblers
• Mnemonic instructions, such as LOAD 104, are easy for
humans to write and understand.
• They are impossible for computers to understand.
• Assemblers translate instructions that are comprehensible to
humans into the machine language that is comprehensible to
computers
– In assembly language, there is a one-to-one correspondence
between a mnemonic instruction and its machine code
– With compilers, this is not usually the case.
• A= B+C
• A=add(B,C)
• A=B + C
Discussion on Assemblers
• Assemblers create an object program file from
mnemonic source code in two passes
– First pass
• the assembler assembles as much of the program is it can,
while it builds a symbol table that contains memory references
for all symbols in the program.
– Second pass
• the instructions are completed using the values from the
symbol table.
Example program
• Note that we have included
– two directives HEX and DEC that specify
the radix of the constants.
• First pass, we have
– a symbol table
– partial instructions.
• Second pass
– the assembly is complete.
– (35)10= 0023h
– (-23)10= FFE9
4.6 Extending Our Instruction Set
100
101
102
103
104
105
106
107
108
109
10A
10B
10C
10D
|
|
|
|
|
|
|Loop
|
|
|
|
|
|
|
LOAD Addr
STORE Next
LOAD Num
SUBT One
STORE Ctr
CLEAR
LOAD Sum
ADDI Next
STORE Sum
LOAD Next
ADD One
STORE Next
LOAD Ctr
SUBT One
10E
10F
110
111
112
113
114
115
116
117
118
119
11A
11B
11C
|
|
|
|
|Addr
|Next
|Num
|Sum
|Ctr
|One
|
|
|
|
|
STORE Ctr
SKIPCOND 000
JUMP Loop
HALT
HEX 118
HEX 0
DEC 5
DEC 0
HEX 0
DEC 1
DEC 10
DEC 15
DEC 2
DEC 25
DEC 30
Control unit
• Causes the fetch-execute cycle to be performed are implemented either in a
hardwired form or by microprogramming
– Hardwired
• each operation including the execution of every machine
instruction uses a decoder to translate the op code into control
sequences (such as move PC to MAR, signal memory read, move
MBR to IR, increment PC)
• implement a program using digital logic components.
– Microprogrammed
• a small program is placed into read-only memory in the
microcontroller
Hardwired control unit
• For example, a hardwired
control unit for our simple
system would need a 4-to16 decoder to decode the
opcode of an instruction.
• Input is the instruction
from the IR,status flags and
the system clock and output
are the control signals to the
registers and other devices
Microprogrammed Control Unit
The control store is a ROM that
stores all of the microprograms
One microprogram per fetch-execute
stage, and one per instruction in the
instruction set
Receive an instruction in IR
Start the microprogram, generating
the address, fetching the instruction
from the ROM and moving it to the
microinstruction buffer to be
decoded an executed
This process is much more time
consuming than the hardwired
unit, but is easier to implement
and more flexible
4.8 Real World Architectures
• MARIE shares many features with modern
architectures but it is not an accurate depiction
of them.
• In the following slides, we briefly examine two
machine architectures.
– Intel architecture, which is a CISC machine
– MIPS, which is a RISC machine.
• CISC- complex instruction set computer.
• RISC- reduced instruction set computer.
We delve into the “RISC versus CISC” argument in Chapter 9.
Intel architecture
• The classic Intel architecture, the 8086, was born in 1979.
• It is a CISC architecture.
• It was adopted by IBM for its famed PC, which was
released in 1981.
• The 8086 operated on 16-bit data words and supported
20-bit memory addresses.
• The 8086 had
– four 16-bit general-purpose registers that could be
accessed by the half-word.
– flags register
– an instruction register
– no built in floating-point processing.
What was the largest memory that the 8086 could address?
Intel
• In 1985, Intel introduced the 32-bit 80386.
• It also had no built-in floating-point unit.
• The 80486, introduced in 1989, was an 80386 that had built-in
floating-point processing and cache memory.
• The 80386 and 80486 offered downward compatibility with the 8086
and 8088.
• Software written for the smaller word systems was directed to use
the lower 16 bits of the 32-bit registers.
• Currently, Intel’s most advanced 32-bit microprocessor is the P4.
• It can run as fast as 3.06 GHz. This clock rate is over 350 times faster
than that of the 8086.
• Speed enhancing features include multilevel cache and instruction
pipelining.
• Intel, along with many others, is marrying many of the ideas of RISC
architectures with microprocessors that are largely CISC.
MIPS architecture
•
•
•
•
•
•
•
•
•
The MIPS family of CPUs has been one of the most successful in its class.
In 1986 the first MIPS CPU was announced.
It had a 32-bit word size and could address 4GB of memory.
Over the years, MIPS processors have been used in general purpose computers
as well as in games.
The MIPS architecture now offers 32- and 64-bit versions.
MIPS was one of the first RISC microprocessors.
The original MIPS architecture had only 55 different instructions, as compared
with the 8086 which had over 100.
MIPS was designed with performance in mind: It is a load/store architecture,
meaning that only the load and store instructions can access memory.
The large number of registers in the MIPS architecture keeps bus traffic to a
minimum.
CISC
• In the 1970’s, memory was expansive and small in size ,so people
designed computers that would pack as many action as possible in a
single instruction this saved memory space ,but added complexity.
• The BCD adder that we made is a CISC since it does both add and
correct in one instruction.
– Microprogrammed control unit
– Large number of instructions (200-500)
– Instructions can do more than 1 thing (that is, an instruction could
carry out 2 or more actions)
– Many addressing modes
– Instructions vary in length and format
– This was the typical form of architecture until the mid 1980s, RISC has
become more popular since then although most architectures remain
CISC
RISC
•
•
•
•
Hardwired control unit
Instruction lengths fixed (usually 32 bits long)
Instruction set limited (80-100 instructions)
Instructions rely mostly on registers, memory
accessed only on loads and stores
• Few addressing modes
• Easy to pipeline for great speedup in
performance
RISC
Pure RISC machines have the following features
1. All RISC instructions codes have the same number of bits
(Typically 32 bit)
CISC instructions can vary and have no fixed length.
Because RISC have fixed instruction code, this makes it a lot easer to be
pipelined, this making it faster
2.The RISC instruction set includes only very simple operations
that can ideally be executed in a small number of clock cycles.
These instructions can then be moved through a pipeline quickly
and not hold the pipeline up
In the BCD adder we would make add and correction two
instructions to make it RISC machine and have it easily pipelined
RISC
3. RISC instructions for reading data from memory include only a
single operand load instruction and a single operand store
instruction.
Because of this RISC machines are referred to as “load and store”
machines
Most CISC machines have a single instruction to load the operand
from memory and then add operand to a register Note doing two
things with a single instruction. Just like in our BCD adder
Two advantages of a simple load and store
machine code only takes a few bits
machine code can be quickly decoded, which makes pipelines easier to
design
RISC
4. The RISC instruction set is designed so that compilers can
efficiently translate high-level language constructer to the
instruction codes for the machine, and the result is easily
portable from one machine to anther.
Compiler writers took little advantage of powerful CISC instructions
because they were very machine specific and then not very portable this
ended up having compilers that acted like they were on a RISC machine
even though they were in CISC machine because this mad it portable
A problem with pure RISC is that it takes many small instructions to do
anything, which uses a lot of memory; this excessive use of memory is cold
“code bloat”
Apure RISC machine also requires a greater memory bandwidth because we
are constantly microprocessors are between RISC and CISC (a
combination of both), in order to reduce code bloating.
Download