Group5Chap12

advertisement
MEMBERS:
ZHE GENG
JORGE MONTENEGRO
CARLOS
GARRIDO
WALLI BUTT
ADRIAN SUAREZ
DIEGO ARIAS
CONTENTS
Processor Organization
Register Organization
 User-visible registers
 Control and Status register
 Example Microprocessor Register Organizations
Instruction Cycle
 The Indirect Cycle
 Data Flow
Instruction Pipeline
 Strategy
 Branch Prediction
CPU MUST DO THE FOLLOWING THINGS :
•
Fetch instruction
--- Read instruction from memory
•
Interpret instruction
--- The instruction is decoded
•
Fetch data
--- Read data from memory or an I/O module
•
Process data
--- Perform arithmetic or logical operation
•
Write data
--- Write data to memory or an I/O module
CPU WITH SYSTEM BUS
CPU INTERNAL STRUCTURE
REGISTERS
• CPU must have some working space (temporary storage)
• Called registers
• Number and function vary between processor designs
• One of the major design decisions
• Top level of memory arrangement
USER VISIBLE REGISTERS
•
General Purpose
•
Data
•
Address
•
Condition Codes
HOW MANY GP REGISTERS?
•
Between 8 - 32
•
Fewer = more memory references
•
More does not reduce memory references and takes up processor real estate
•
See also RISC
• One cycle execution time
• Pipelining
• Large number of registers
HOW BIG?
•
Large enough to hold full address
•
Large enough to hold full word
•
Often possible to combine two data registers
• C programming
• double int a;
• long int a;
CONDITION CODE REGISTERS
•
ADVANTAGES:
• Because condition codes are set by normal arithmetic and data movement
instructions
• Conditional instructions, such as BRANCH are simplified relative to composite
instructions, such as TEST AND BRANCH.
• Condition codes facilitate multi-way branches. For example, a TEST instruction
can be followed by two branches, one on less than or equal to zero and one on
greater than zero.
CONDITION CODE REGISTERS
•
DISADVANTAGES:
• Condition codes add complexity, both to the hardware and software. Condition
code bits are often modified in different ways by different instructions.
• Condition codes are irregular, they are typically not part of the main data path, so
they require extra hardware connections.
• Often condition code machines must add special non-condition-code instructions
for special situations anyway.
• In a pipelined implementation, condition codes require special synchronization to
avoid conflicts.
CONTROL AND STATUS REGISTERS
• Program Counter (PC): Contains the address of an instruction to be fetched.
• Instruction Decoding Register (IR): Contains the instruction most recently
fetched.
• Memory Address Register (MAR): Contains the address of a location in
memory
• Memory Buffer Register (MBR): Contains
memory or the most recently read.
a word of data to be written to
PROGRAM STATUS WORD (PSW)
•
Sign: Contains the sign bit of the result of the last arithmetic operation.
•
Zero: Set when the register is “0”
•
Carry: Set if an operation resulted in a carry addition or subtraction of a
higher order bit
•
Equal: Set if a logical compare result is equality.
•
Overflow: Used to indicated arithmetic overflow.
•
Interrupt enable/disable: Used to enable or disable interrupts.
SUPERVISOR MODE
Supervisor: Indicates whether the processor is executing in supervisor mode
or user mode
•
Privilege instruction
•
Address space
•
Memory management
Protection domain or Protection Ring
•
Ring
•
Kernel
MICROPROCESSOR REGISTER ORGANIZATION
SECTION 12.3
INSTRUCTION CYCLE
•
AN INSTRUCTION CYCLE
(SOMETIMES CALLED FETCH-ANDEXECUTE CYCLE, FETCH-DECODEEXECUTE CYCLE, OR FDX) IS THE
B A S I C O P E R AT I O N C Y C L E O F A
COMPUTER.
•
It is the process by which a computer retrieves
a program instruction from its memory,
determines what actions the instruction
requires, and carries out those actions.
•
This cycle is repeated continuously by the
central processing unit (CPU), from bootup to
when the computer is shut down.
The circuits used in the CPU during the cycle are:
Program Counter (PC) –
Memory Address Register (MAR) Memory Data Register (MDR) Instruction register (IR) –
Control Unit (CU) Arithmetic logic unit (ALU) There are typically four stages of an instruction cycle
that the CPU carries out:
1) Fetch the instruction from memory.
2) "Decode" the instruction.
3) "Read the effective address" from memory if the
instruction has an indirect address.
4) "Execute" the instruction.
The instruction cycle is the time in which a single instruction is
fetched from memory, decoded, and executed. THE FOUR
SUB-CYCLES:
Fetch
Reads the next instruction from
memory into the processor.
Indirect Cycle
May require memory access to
fetch operands, therefore more memory
accesses.
Interrupt
Save current instruction and service
the interrupt.
Execute
Interpret the opcode and perform
the indicated operation.
There are six
fundamental phases of
the instruction cycle:
1.) fetch
instruction (aka
pre-fetch)
2.) decode
instruction
3.) evaluate
address (address
generation)
4.) fetch operands
(read memory
data)
5.) execute (ALU
access)
6.) store result
(writeback
memory data)
DECODE EVALUATE AND FETCH
•
•
•
•
•
•
•
Decoding the instruction?
The decoder interprets what?
What is being fetched from memory?
What decision is made next?
Based on the decision what are the options?
What if decision is a direct memory operation?
What if decision is an indirect memory
operation?
A NOTE ON ADDRESSING MODES
INSTRUCTION CYCLE WITH AND WITHOUT INDIRECT CYCLE…
SAMPLE QUESTION:
Given that the instruction cycle is the time in which a single
instruction is fetched from memory, decoded, and executed:
A microprocessor provides an instruction capable of
moving a string of bytes from one area of memory to
another. The fetching and initial decoding of the
instruction takes 10 clock cycles.Thereafter, it takes
15 clock cycles to transfer each byte.The
microprocessor is clocked at a rate of 10 GHz.
Determine the length of the instruction cycle for the
case of a string of 64 bytes.
ANSWER:
The length of a clock cycle is 0.1 ns. The length of the
instruction cycle for this
case is [10 + (15 × 64)] × 0.1 = 960 ns.
ANOTHER EXAMPLE: TOTAL NUMBER OF CYCLES REQUIRED
To execute the SAL instruction:
add A, B, C
1.) Fetch instruction (add) from memory address PC.
2.) Increment PC to address of next instruction.
3.) Decode the instruction and operands.
4.) Load the operands B and C from memory.
5.) Execute the add operation.
6.) Store the result into memory location A.
Execution Time
Suppose each memory access (fetch, load, store) requires 10 clock cycles and that
the PC update, instruction decode, and execution each require 1 clock cycle. The
total number of cycles to execute the add instruction is:
10+1+1+10+10+1+10 = 43 cycles/instruction.
A CPU running at 100 Mhz (100,000,000 cycles/sec) can execute add instructions at
a rate of 100,000,000/43 = 2,325,581 instructions/sec, or ~2.3 Mips (million
instructions/sec).
DATA FLOW: FETCH CYCLE
IR
MBR
DATA FLOW: INDIRECT CYCLE
Memory
MBR
DATA FLOW: INTERRUPT CYCLE
PC
Control
Unit
DATA FLOW: EXECUTE CYCLE
The execute cycle:
Takes many forms
 the form depends on which of the various machine instructions is in the
IR.
This cycle may involve
 transferring data among registers
 read or write from memory
 I/O
 invocation of the ALU
INSTRUCTION PIPELINING
By separating an instruction cycle into stages, multiple instructions at
different stages can be worked on at the same time.
For example, Stage 2 of the current instruction can be overlapped with
Stage 1 of the next instruction.
A TWO-STAGE PIPELINE
An instruction cycle can be divided into two stages:
•
Fetch: get an op-code from main memory and put it in a register
•
Execute: decode an op-code and execute the instruction
The execute stage of the current instruction would overlap with the fetch
stage of the next instruction.
Assuming that fetch and execute use the same number of clock cycles, this
would double the speed (in reality, execute takes longer).
A TWO-STAGE PIPELINE
A SIX-STAGE PIPELINE
•
Fetch instruction (FI): get op-code from memory and put it in a register
•
Decode instruction (DI): decode op-code and determine addressing
mode
•
Calculate operand (CO): get effective address of source operands
•
Fetch operands (FO): get operands from memory and put them in
registers
•
Execute instruction (EI): execute instruction and write result to a register
•
Write operand (WO): store the result in memory
This pipeline is more typical of modern computers, especially RISC
computers (e.g. MIPS, SPARC, and DLX).
Each stage occupies about the same number of clock cycles.
A SIX-STAGE PIPELINE
WHY NOT A 100-STAGE PIPELINE?
If 1 instruction per cycle can be achieved with a 5-stage pipeline, adding
more stages would only increase the number registers without
increasing speed (it might actually make the computer less efficient).
Overlapping of instructions requires additional logic to account for
dependencies between instructions (i.e. a memory read after a memory
write to the same location).
PIPELINE HAZARDS
•
Resource hazards: when two instructions in the pipeline need to use the
same resource
•
Data hazards: when two instructions must be executed in sequence (i.e.
a memory write followed by a memory read)
•
Branch hazards: when a conditional branch occurs and the pipeline
fetches the wrong instructions
RESOURCE HAZARDS
A resource hazard occurs when two stages in the pipeline need to use the
same resource at the same time. For example, two stages need to read
from main memory (assuming that the data hasn’t been cached) when
an operand fetch is overlapped with an instruction fetch.
In this case, the two stages must be executed in series rather than in
parallel, and a delay must be introduced in the pipeline.
RESOURCE HAZARDS
RESOURCE HAZARDS
DATA HAZARDS
A data hazard occurs whenever data is fetched from a location before it
contains the correct value.
The “correct” value is whatever value it would contain if the instructions
were executed in sequence.
Whenever the fetch stage must access data that hasn’t yet been written, the
pipeline must be delayed at the fetch stage.
DATA HAZARDS
ADD EAX, EBX
SUB ECX, EAX
; I3
; I4
BRANCH HAZARDS
PIPELINE IMPLEMENTATION
A pipeline is implemented as a series of sequential circuits, with each stage taking its
input from the output of the previous stage
DEALING WITH BRANCHES
The most difficult part in designing an instruction pipeline is
assuring a steady flow of instructions to the initial stages of the
pipeline. Several approaches have been taken for dealing with
conditional branches.
• Multiple streams
• Prefetch branch target
• Loop buffer
• Branch prediction
• Delayed branch.
DEALING WITH BRANCHES
MULTIPLE STREAMS
A pipeline has disadvantages for a branch instruction because it
must choose one of two instructions to fetch the next instruction
and may take the wrong choice. One way of dealing with this is to
allow the pipeline to fetch both instructions, making use of both
streams.
• With multiple pipelines there are delays for accessing register
and memory.
• Additional branch instructions may enter the pipeline before the
original branch decision is resolved.
PREFETCH BRANCH TARGET
The target of the branch is prefetched when a conditional
branch is recognized in addition to the instruction
following the branch.
The target is saved until execution, if a branch is taken that
means that it has already been prefetched.
LOOP BUFFER
• A loop buffer is high speed memory that works in
sequence with the instruction fetch stage of the pipeline
and it contains the most recently fetched instruction.
• Instructions fetched in sequence will be available without
the usual memory access time.
• If a branch occurs to be ahead of the address of the
branch instruction, the target will already be in the
buffer.
• If the loop buffer is large enough to contain all the
instruction in the loop , then the instructions will only
have to be fetched once.
LOOP BUFFER DIAGRAM
BRANCH PREDICTION
The are several techniques to predict whether or not a
branch will be taken.
• Predict never taken
• Predict always taken
• Predict by opcode
• Taken/ not taken switch
• Branch history table
BRANCH PREDICTION FLOWCHART
BRANCH PREDICTION STATE DIAGRAM
DELAYED BRANCH
It’s possible to improve the performance of a pipeline
be rearranging instructions within a program, so
that the instructions occurs later than actually
desired. This branch will not take effect until after
the execution of the following instruction.
INTEL 80486 PIPELINING
The Intel 80486 implements a five stage pipeline.
• Fetch
• Decode stage 1
• Decode stage 2
• Execute
• Write back
80486 INSTRUCTION PIPELINE EXAMPLES
QUESTIONS
1 What’s the function of internal processing bus?
2 What’s the similarity between the internal structure as a whole and the
internal structure of the CPU?
3. What is an instruction cycle?
4. What are the four sub cycles of an instruction cycle?
5. Is the fetch or execute cycle the same for all CPU?
6. What is the sequence of an interrupt cycle?
7. How does pipelining increase processor speed?
8. What are some pipeline hazards?
9. Which computers use a 5-stage pipeline?
10. What are the five ways to deal with conditional branches?
11. What happens in the fetch cycle inside an Intel 80486?
REFERENCES
Computer Organization and Architecture, Designing for Performance, 8/E, Stallings, William
Embedded System Design: A Unified Hardware/Software Approach, Vahid, Frank, and Givargis, Tony
Wikipedia, “Instruction Cycle” http://en.wikipedia.org/wiki/Instruction_cycle
CIS-77 Introduction to Computer Systems http://www.c-jump.com/CIS77/CPU/InstrCycle/lecture.html
Download