Computer Organization and Architecture The CPU Structure

advertisement
Computer
Organization
and Architecture
The CPU Structure
CPU Structure

CPU must:





Fetch instructions
Interpret instructions
Fetch data
Process data
Write data
External View of CPU
Internal View of CPU
Registers





CPU must have some working space
(temporary storage)
Called registers
Number and function vary between processor
designs
One of the major design decisions
Top level of memory hierarchy
User Visible Registers




General Purpose
Data
Address
Condition Codes
General Purpose Registers (1)




May be true general purpose
May be restricted
May be used for data or addressing
Data


Accumulator
Addressing

Segment
General Purpose Registers (2)

Make them general purpose



Increase flexibility and programmer options
Increase instruction size & complexity
Make them specialized


Smaller (faster) instructions
Less flexibility
How Many GP Registers?




Between 8 - 32
Fewer = more memory references
More does not reduce memory references
and takes up processor real estate
See also RISC
How big?



Large enough to hold the largest address
Data register should be able to hold the value
of most date types
Some machines allow two contiguous
register to be used as one for holding doublelength values
Condition Code Registers

Sets of individual bits


Can be read (implicitly) by programs


e.g. result of last operation was zero
e.g. Jump if zero
Can not (usually) be set by programs
Control & Status Registers

Program Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register

Revision: what do these all do?



Program Status Word









A set of bits
Includes Condition Codes
Sign of last result
Zero
Carry
Equal
Overflow
Interrupt enable/disable
Supervisor
Supervisor Mode





Intel ring zero
Kernel mode
Allows privileged instructions to execute
Used by operating system
Not available to user programs
Example Register
Organizations
Indirect Cycle



May require memory access to fetch
operands
Indirect addressing requires more memory
accesses
Can be thought of as additional instruction
subcycle
Instruction Cycle with Indirect
Instruction Cycle State
Diagram
Data Flow (Instruction Fetch)



Depends on CPU design
In general:
Fetch






PC contains address of next instruction
Address moved to MAR
Address placed on address bus
Control unit requests memory read
Result placed on data bus, copied to MBR, then to
IR
Meanwhile PC incremented by 1
Data Flow (Data Fetch)


IR is examined
If indirect addressing, indirect cycle is
performed



Right most N bits of MBR transferred to MAR
Control unit requests memory read
Result (address of operand) moved to MBR
Data Flow (Fetch Diagram)
Data Flow (Indirect Diagram)
Data Flow (Execute)



May take many forms
Depends on instruction being executed
May include




Memory read/write
Input/Output
Register transfers
ALU operations
Data Flow (Interrupt)








Simple
Predictable
Current PC saved to allow resumption after interrupt
Contents of PC copied to MBR
Special memory location (e.g. stack pointer) loaded to
MAR
MBR written to memory
PC loaded with address of interrupt handling routine
Next instruction (first of interrupt handler) can be fetched
Data Flow (Interrupt Diagram)
Prefetch




Fetch accessing main memory
Execution usually does not access main
memory
Can fetch next instruction during execution of
current instruction
Called instruction prefetch
Acceleration by Pipelining
The Instruction Cycle
Acceleration by Pipelining
Theoretical Performance

An ideal pipeline divides a task into k
independent sequential subtasks



For n iterations of task, the execution times:



Each subtask requires 1 time unit to complete
The task itself requires k time units to complete
With no pipelining: nk time units
With pipelining: k + (n-1) time units
Speedup of a k-stage pipeline is

S = nk / [k+(n-1)] ==> k (for large n)
Acceleration by Pipelining
Pipeline Hazards
Structural Hazards
Data Hazards
Data Hazards
Data Hazards
Control Hazards
Control Hazards
Control Hazards
Control Hazards
Control Hazards


With conditional branch we have a penalty
even if the branch has not been taken. This
is because we have to wait until the branch
condition is available.
Branch instructions represent a major
problem in assuring an optimal flow through
the pipeline.
Brief Summary



Structural hazards are due to resource
conflicts.
Data hazards are produced by data
dependencies between instructions.
Control hazards are produced as
consequence of branch instructions.
Branches




Branch instructions can dramatically affect pipeline
performance. Control operations are very frequent
in current programs.
20% - 35% of the instructions executed are
branches (conditional and unconditional).
65% of the branches actually take the branch.
Conditional branches are much more frequent than
unconditional (more than two times). More than
50% of conditional branches are taken.
Dealing with Branches




Multiple Streams
Loop buffer
Delayed branching
Branch prediction



static prediction
dynamic prediction
branch history table
Multiple Streams





Have two pipelines
Prefetch each branch into a separate pipeline
Use appropriate pipeline
Leads to bus & register contention
Multiple branches lead to further pipelines being
needed
Loop Buffer







Very fast memory
Contain the n most recently fetched instruction
Maintained by fetch stage of pipeline
Check buffer before fetching from memory
Very good for small loops or jumps
c.f. cache
Used by CRAY-1
Delayed Branching




The idea with delayed branching is to let the CPU
do some useful work during some of the cycles
which are shown to be stalled
With delayed branching the CPU always executes
the instruction that immediately follows after the
branch and only then alters (if necessary) the
sequence of execution. The instruction after the
branch is said to be in the branch delay slot
Leads to bus & register contention
Multiple branches lead to further pipelines being
needed
Delayed Branching
Delayed Branching
Comparison
Comparison
Delayed Branching
Branch Prediction

Correct branch prediction is very important
and can produce substantial performance
improvements.



static prediction
dynamic prediction
To take full advantage of branch prediction,
we can have the instructions not only fetched
but also begin execution. This is known as
speculative execution
Speculative Execution

Instructions are executed before the processor
is certain that they are in the correct execution
path. If it turns out that the prediction was
correct, execution goes on without introducing
any branch penalty. If, however, the prediction is
not fulfilled, the instruction(s) started in advance and all their associated data must be
purged and the state previous to their execution
restored.
Static Branch Prediction



Static prediction techniques do not take into
consideration execution history.
Predict never taken (Motorola 68020): assumes
that the branch is not taken.
Predict always taken: assumes that the branch
is taken.
Dynamic Branch Prediction


Improve the accuracy of prediction by recording
the history of conditional branches.
One-bit prediction scheme


is used in order to record if the last execution resulted
in a branch taken or not. The system predicts the
same behavior as for the last time.
Two-bit prediction scheme

with a two-bit scheme predictions can be made
depending on the last two instances of execution.
One-Bit Prediction Scheme
Two-Bit Prediction Scheme
Branch History Table



History info. can be used not only to predict the
outcome of a conditional branch but also to avoid
recalculation of the target address. Together with
bits used for prediction, the target address can be
stored for later use in a branch history table.
Using D. B. P with history tables up to 90% of
predictions can be correct.
Pentium,PowerPC620 use speculative execution
with D.B.P based on a branch history table.
Branch History Table
Download