Uploaded by anmol rawat

unit 3

advertisement
www.getmyuni.com
Central Processing Unit and Pipeline
Subject: Computer Organization And
Architecture
1
www.getmyuni.com
CENTRAL PROCESSING UNIT
• Introduction
• General Register Organization
• Stack Organization
• Instruction Formats
• Addressing Modes
• Data Transfer and Manipulation
• Program Control
• Reduced Instruction Set Computer (RISC)
2
MAJOR COMPONENTS OF CPU
www.getmyuni.com
Storage Components:
Registers
Flip-flops
Execution (Processing) Components:
Arithmetic Logic Unit (ALU):
Arithmetic calculations, Logical computations, Shifts/Rotates
Transfer Components:
Bus
Control Components:
Control Unit
Register
File
ALU
Control Unit
3
GENERAL REGISTER ORGANIZATION
www.getmyuni.com
Input
Clock
R1
R2
R3
R4
R5
R6
R7
Load
(7 lines)
SELA
{
3x8
decoder
MUX
MUX
A bus
SELD
OPR
} SELB
B bus
ALU
Output
4
OPERATION OF CONTROL UNIT
www.getmyuni.com
The control unit directs the information flow through ALU by:
- Selecting various Components in the system
- Selecting the Function of ALU
Example: R1 <- R2 + R3
[1] MUX A selector (SELA): BUS A  R2
[2] MUX B selector (SELB): BUS B  R3
[3] ALU operation selector (OPR): ALU to ADD
[4] Decoder destination selector (SELD): R1  Out Bus
3
SELA
3
SELB
Control Word
Encoding of register selection fields
3
SELD
Binary
Code
000
001
010
011
100
101
110
111
5
OPR
SELA
Input
R1
R2
R3
R4
R5
R6
R7
SELB
Input
R1
R2
R3
R4
R5
R6
R7
SELD
None
R1
R2
R3
R4
R5
R6
R7
5
Control
ALU CONTROL
www.getmyuni.com
Encoding of ALU operations
OPR
Select
00000
00001
00010
00101
00110
01000
01010
01100
01110
10000
11000
Operation
Transfer A
Increment A
ADD A + B
Subtract A - B
Decrement A
AND A and B
OR A and B
XOR A and B
Complement A
Shift right A
Shift left A
Symbol
TSFA
INCA
ADD
SUB
DECA
AND
OR
XOR
COMA
SHRA
SHLA
Examples of ALU Microoperations
Symbolic Designation
Microoperation
SELA
SELB
SELD
OPR
Control Word
R1 R2 - R3
R2
R3 R1 SUB
010 011 001 00101
R4 R4 R5
R4
R5 R4
OR
100 101 100 01010
R6 R6 + 1
R6
R6 INCA 110 000 110 00001
R7 R1
R1
R7 TSFA 001 000 111 00000
Output R2
R2
None TSFA 010 000 000 00000
Output Input Input
None TSFA 000 000 000 00000
R4 shl R4
R4
R4 SHLA 100 000 100 11000
R5 0
R5
R5
R5 XOR 101 101 101 01100
6
REGISTER STACK ORGANIZATION
www.getmyuni.com
Stack
- Very useful feature for nested subroutines, nested loops control
- Also efficient for arithmetic expression evaluation
- Storage which can be accessed in LIFO
- Pointer: SP
- Only PUSH and POP operations are applicable
stack
Register Stack
63
Flags
FULL
Address
EMPTY
Stack pointer
SP
Push, Pop operations
C
B
A
4
3
2
1
0
DR
/* Initially, SP = 0, EMPTY = 1, FULL = 0 */
PUSH
POP
SP SP + 1
DR M[SP]
M[SP] DR
SP SP - 1
If (SP = 0) then (FULL 1)
If (SP = 0) then (EMPTY 1)
EMPTY 0
FULL 0
7
MEMORY STACK ORGANIZATION
www.getmyuni.com
1000
Memory with Program, Data,
and Stack Segments
PC
Program
(instructions)
AR
Data
(operands)
SP
- A portion of memory is used as a stack with a
processor register as a stack pointer
- PUSH:
- POP:
3000
stack
3997
3998
3999
4000
4001
DR
SP  SP - 1
M[SP]  DR
DR  M[SP]
SP  SP + 1
- Most computers do not provide hardware to check
stack overflow (full stack) or underflow(empty stack)
8
REVERSE POLISH NOTATION
www.getmyuni.com
Arithmetic Expressions: A + B
A+B
+AB
AB+
Infix notation
Prefix or Polish notation
Postfix or reverse Polish notation
- The reverse Polish notation is very suitable for stack
manipulation
Evaluation of Arithmetic Expressions
Any arithmetic expression can be expressed in parenthesis-free
Polish notation, including reverse Polish notation
(3 * 4) + (5 * 6) 
34*56*+
3
4
3
12
5
12
6
5
12
3
4
*
5
6
30
12
42
*
+
9
Instruction Format
INSTRUCTION FORMAT
www.getmyuni.com
Instruction Fields
OP-code field - specifies the operation to be performed
Address field - designates memory address(s) or a processor register(s)
Mode field - specifies the way the operand or the
effective address is determined
The number of address fields in the instruction format
depends on the internal organization of CPU
- The three most common CPU organizations:
Single accumulator organization:
ADD
X
/* AC  AC + M[X] */
General register organization:
ADD
R1, R2, R3
/* R1  R2 + R3 */
ADD
R1, R2
/* R1  R1 + R2 */
MOV
R1, R2
/* R1  R2 */
ADD
R1, X
/* R1  R1 + M[X] */
Stack organization:
PUSH
X
/* TOS  M[X] */
ADD
10
THREE, and TWO-ADDRESS INSTRUCTIONS
www.getmyuni.com
Three-Address Instructions:
Program to evaluate X = (A + B) * (C + D) :
ADD
R1, A, B /* R1 M[A] + M[B] */
ADD
R2, C, D /* R2 M[C] + M[D] */
MUL X, R1, R2
/* M[X] R1 * R2
*/
- Results in short programs
- Instruction becomes long (many bits)
Two-Address Instructions:
Program to evaluate X = (A + B) * (C + D) :
MOV
ADD
MOV
ADD
MUL
MOV
R1, A
R1, B
R2, C
R2, D
R1, R2
X, R1
/* R1 M[A]
*/
/* R1 R1 + M[B] */
/* R2 M[C]
*/
/* R2 R2 + M[D] */
/* R1 R1 * R2 */
/* M[X] R1
*/
11
ONE, and ZERO-ADDRESS INSTRUCTIONS
www.getmyuni.com
One-Address Instructions:
- Use an implied AC register for all data manipulation
- Program to evaluate X = (A + B) * (C + D) :
LOAD A
/* AC M[A]
*/
ADD
B
/* AC AC + M[B]
*/
STORE T
/* M[T] AC */
LOAD C
/* AC M[C]
*/
ADD
D
/* AC AC + M[D]
*/
MUL T
/* AC AC * M[T]
*/
STORE X
/* M[X] AC
*/
Zero-Address Instructions:
- Can be found in a stack-organized computer
- Program to evaluate X = (A + B) * (C + D) :
PUSH
A
/* TOS A
*/
PUSH
B
/* TOS B
*/
ADD
/* TOS (A + B)
*/
PUSH
C
/* TOS C
*/
PUSH
D
/* TOS D
*/
ADD
/* TOS (C + D)
*/
MUL
/* TOS (C + D) * (A + B) */
POP
X
/* M[X] TOS
*/
12
www.getmyuni.com
ADDRESSING MODES
Addressing Modes:
* Specifies a rule for interpreting or modifying the
address field of the instruction (before the operand
is actually referenced)
* Variety of addressing modes
- to give programming flexibility to the user
- to use the bits in the address field of the
instruction efficiently
13
TYPES OF ADDRESSING MODES
www.getmyuni.com
Implied Mode
Address of the operands are specified implicitly
in the definition of the instruction
- No need to specify address in the instruction
- EA = AC, or EA = Stack[SP],
EA: Effective Address.
Immediate Mode
Instead of specifying the address of the operand,
operand itself is specified
- No need to specify address in the instruction
- However, operand itself needs to be specified
- Sometimes, require more bits than the address
- Fast to acquire an operand
Register Mode
Address specified in the instruction is the register address
- Designated operand need to be in a register
- Shorter address than the memory address
- Saving address field in the instruction
- Faster to acquire an operand than the memory addressing
- EA = IR(R) (IR(R): Register field of IR)
14
TYPES OF ADDRESSING MODES
www.getmyuni.com
Register Indirect Mode
Instruction specifies a register which contains
the memory address of the operand
- Saving instruction bits since register address
is shorter than the memory address
- Slower to acquire an operand than both the
register addressing or memory addressing
- EA = [IR(R)] ([x]: Content of x)
Auto-increment or Auto-decrement features:
Same as the Register Indirect, but:
- When the address in the register is used to access memory, the
value in the register is incremented or decremented by 1 (after or before
the execution of the instruction)
15
TYPES OF ADDRESSING MODES
www.getmyuni.com
Direct Address Mode
Instruction specifies the memory address which
can be used directly to the physical memory
- Faster than the other memory addressing modes
- Too many bits are needed to specify the address
for a large physical memory space
- EA = IR(address), (IR(address): address field of IR)
Indirect Addressing Mode
The address field of an instruction specifies the address of a memory
location that contains the address of the operand
- When the abbreviated address is used, large physical memory can be
addressed with a relatively small number of bits
- Slow to acquire an operand because of an additional memory
access
- EA = M[IR(address)]
16
TYPES OF ADDRESSING MODES
www.getmyuni.com
Relative Addressing Modes
The Address fields of an instruction specifies the part of the address
(abbreviated address) which can be used along with a
designated register to calculate the address of the operand
PC Relative Addressing Mode(R = PC)
- EA = PC + IR(address)
- Address field of the instruction is short
- Large physical memory can be accessed with a small number of
address bits
Indexed Addressing Mode
XR: Index Register:
- EA = XR + IR(address)
Base Register Addressing Mode
BAR: Base Address Register:
- EA = BAR + IR(address)
17
ADDRESSING MODES - EXAMPLES
www.getmyuni.com
Address
PC = 200
Memory
200
201
202
Load to AC Mode
Address = 500
Next instruction
399
400
450
700
500
800
600
900
702
325
800
300
R1 = 400
XR = 100
AC
Addressing
Effective
Content
Mode
Address
of AC
Direct address
500
/* AC  (500)
*/ 800
Immediate operand
/* AC  500
*/ 500
Indirect address
800
/* AC  ((500))
*/ 300
Relative address
702
/* AC  (PC+500) */ 325
Indexed address
600
/* AC  (XR+500) */ 900
Register
- /* AC  R1
*/ 400
Register indirect
400
/* AC  (R1)
*/ 700
Autoincrement
400
/* AC  (R1)+
*/ 700
Autodecrement
399
/* AC  -(R)
*/ 450
18
DATA TRANSFER INSTRUCTIONS
www.getmyuni.com
Typical Data Transfer Instructions
Name
Mnemonic
Load
LD
Store
ST
Move
MOV
Exchange
XCH
Input
IN
Output
OUT
Push
PUSH
Pop
POP
Data Transfer Instructions with Different Addressing Modes
Assembly
Register Transfer
Convention
Direct address
LD ADR
AC M[ADR]
Indirect address
LD @ADR
AC M[M[ADR]]
Relative address
LD $ADR
AC M[PC + ADR]
Immediate operand
LD #NBR
AC NBR
Index addressing
LD ADR(X)
AC M[ADR + XR]
Register
LD R1
AC R1
Register indirect
LD (R1)
AC M[R1]
Autoincrement
LD (R1)+
AC M[R1], R1 R1 + 1
Autodecrement
LD -(R1)
R1 R1 - 1, AC M[R1]
Mode
19
DATA MANIPULATION INSTRUCTIONS
www.getmyuni.com
Three Basic Types:
Arithmetic instructions
Logical and bit manipulation instructions
Shift instructions
Arithmetic Instructions
Name
Increment
Decrement
Add
Subtract
Multiply
Divide
Add with Carry
Subtract with Borrow
Negate(2’s Complement)
Mnemonic
INC
DEC
ADD
SUB
MUL
DIV
ADDC
SUBB
NEG
Logical and Bit Manipulation Instructions
Name
Mnemonic
Clear
CLR
Complement
COM
AND
AND
OR
OR
Exclusive-OR
XOR
Clear carry
CLRC
Set carry
SETC
Complement carry
COMC
Enable interrupt
EI
Disable interrupt
DI
Shift Instructions
Name
Mnemonic
Logical shift right
SHR
Logical shift left
SHL
Arithmetic shift right
SHRA
Arithmetic shift left
SHLA
Rotate right
ROR
Rotate left
ROL
Rotate right thru carry
RORC
Rotate left thru carry
ROLC
20
PROGRAM CONTROL INSTRUCTIONS
www.getmyuni.com
+1
In-Line Sequencing
(Next instruction is fetched from the
next adjacent location in the memory)
PC
Address from other source; Current Instruction, Stack, etc
Branch, Conditional Branch, Subroutine, etc
Program Control Instructions
Name
Branch
Jump
Skip
Call
Return
Compare(by - )
Test (by AND)
Status Flag Circuit
Mnemonic
BR
JMP
SKP
CALL
RTN
CMP
TST
A
B
8
* CMP and TST instructions do not retain their
results of operations(- and AND, respectively).
They only set or clear certain Flags.
8
c7
c8
V Z S C
8-bit ALU
F7 - F0
F7
8
Check for
zero output
F
21
CONDITIONAL BRANCH INSTRUCTIONS
www.getmyuni.com
Mnemonic Branch condition
Tested condition
BZ
Branch if zero
BNZ
Branch if not zero
BC
Branch if carry
BNC
Branch if no carry
BP
Branch if plus
BM
Branch if minus
BV
Branch if overflow
BNV
Branch if no overflow
Unsigned compare conditions (A - B)
BHI
Branch if higher
BHE
Branch if higher or equal
BLO
Branch if lower
BLOE
Branch if lower or equal
BE
Branch if equal
BNE
Branch if not equal
Z=1
Z=0
C=1
C=0
S=0
S=1
V=1
V=0
A>B
AB
A<B
AB
A=B
AB
Signed compare conditions (A - B)
BGT
Branch if greater than
BGE
Branch if greater or equal
BLT
Branch if less than
BLE
Branch if less or equal
BE
Branch if equal
BNE
Branch if not equal
A>B
AB
A<B
AB
A=B
AB
22
SUBROUTINE CALL AND RETURN
www.getmyuni.com
SUBROUTINE CALL
Call subroutine
Jump to subroutine
Branch to subroutine
Branch and save return address
Two Most Important Operations are Implied;
* Branch to the beginning of the Subroutine
- Same as the Branch or Conditional Branch
* Save the Return Address to get the address
of the location in the Calling Program upon
exit from the Subroutine
- Locations for storing Return Address:
• Fixed Location in the subroutine(Memory)
• Fixed Location in memory
• In a processor Register
• In a memory stack
- most efficient way
CALL
SP  SP - 1
M[SP]  PC
PC  EA
RTN
PC  M[SP]
SP  SP + 1
23
PROGRAM INTERRUPT
Types
of Interrupts:
www.getmyuni.com
External interrupts
External Interrupts initiated from the outside of CPU and Memory
- I/O Device -> Data transfer request or Data transfer complete
- Timing Device -> Timeout
- Power Failure
Internal interrupts (traps)
Internal Interrupts are caused by the currently running program
- Register, Stack Overflow
- Divide by zero
- OP-code Violation
- Protection Violation
Software Interrupts
Both External and Internal Interrupts are initiated by the computer Hardware.
Software Interrupts are initiated by texecuting an instruction.
- Supervisor Call -> Switching from a user mode to the supervisor mode
-> Allows to execute a certain class of operations
which are not allowed in the user mode
24
INTERRUPT PROCEDURE
www.getmyuni.com
Interrupt Procedure and Subroutine Call
- The interrupt is usually initiated by an internal or
an external signal rather than from the execution of
an instruction (except for the software interrupt)
- The address of the interrupt service program is
determined by the hardware rather than from the
address field of an instruction
- An interrupt procedure usually stores all the
information necessary to define the state of CPU
rather than storing only the PC.
The state of the CPU is determined from;
Content of the PC
Content of all processor registers
Content of status bits
Many ways of saving the CPU state depending on the CPU architectures
25
RISC: REDUCED INSTRUCTION SET COMPUTERS
www.getmyuni.com
Historical Background
IBM System/360, 1964
- The real beginning of modern computer architecture
- Distinction between Architecture and Implementation
- Architecture: The abstract structure of a computer
seen by an assembly-language programmer
High-Level
Language
Compiler
Instruction
Set
Architecture
-program
Hardware
Implementation
Continuing growth in semiconductor memory and microprogramming
-> A much richer and complicated instruction sets
=> CISC(Complex Instruction Set Computer)
- Arguments advanced at that time
Richer instruction sets would simplify compilers
Richer instruction sets would alleviate the software crisis
- move as much functions to the hardware as possible
- close Semantic Gap between machine language
and the high-level language
Richer instruction sets would improve the architecture quality
26
www.getmyuni.com
COMPLEX INSTRUCTION SET COMPUTERS: CISC
High Performance General Purpose Instructions
Characteristics of CISC:
1.
2.
3.
4.
5.
A large number of instructions (from 100-250 usually)
Some instructions that performs a certain tasks are not used frequently.
Many addressing modes are used (5 to 20)
Variable length instruction format.
Instructions that manipulate operands in memory.
27
PHYLOSOPHY OF RISC
www.getmyuni.com
Reduce the semantic gap between machine instruction and microinstruction
1-Cycle instruction
Most of the instructions complete their execution
in 1 CPU clock cycle - like a microoperation
* Functions of the instruction (contrast to CISC)
- Very simple functions
- Very simple instruction format
- Similar to microinstructions
=> No need for microprogrammed control
* Register-Register Instructions
- Avoid memory reference instructions except
Load and Store instructions
- Most of the operands can be found in the
registers instead of main memory
=> Shorter instructions
=> Uniform instruction cycle
=> Requirement of large number of registers
* Employ instruction pipeline
28
CHARACTERISTICS OF RISC
www.getmyuni.com
Common RISC Characteristics
- Operations are register-to-register, with only LOAD and STORE
accessing memory
- The operations and addressing modes are reduced
Instruction formats are simple
29
CHARACTERISTICS OF RISC
www.getmyuni.com
RISC Characteristics
- Relatively few instructions
- Relatively few addressing modes
- Memory access limited to load and store instructions
- All operations done within the registers of the CPU
- Fixed-length, easily decoded instruction format
- Single-cycle instruction format
- Hardwired rather than microprogrammed control
More RISC Characteristics
-A relatively large numbers of registers in the processor unit.
-Efficient instruction pipeline
-Compiler support: provides efficient translation of high-level language
programs into machine language programs.
Advantages of RISC
- VLSI Realization
- Computing Speed
- Design Costs and Reliability
- High Level Language Support
30
www.getmyuni.com
Parallel processing
 A parallel processing system is able to perform concurrent
data processing to achieve faster execution time
 The system may have two or more ALUs and be able to
execute two or more instructions at the same time
 Goal is to increase the throughput – the amount of
processing that can be accomplished during a given interval
of time
www.getmyuni.com
Parallel processing classification
Single instruction stream, single data stream – SISD
Single instruction stream, multiple data stream – SIMD
Multiple instruction stream, single data stream – MISD
Multiple instruction stream, multiple data stream – MIMD
www.getmyuni.com
Single instruction stream, single data stream – SISD
• Single control unit, single computer, and a memory unit
• Instructions are executed sequentially. Parallel processing
may be achieved by means of multiple functional units or by
pipeline processing
www.getmyuni.com
Single instruction stream, multiple data stream – SIMD
• Represents an organization that includes many processing
units under the supervision of a common control unit.
• Includes multiple processing units with a single control unit.
All processors receive the same instruction, but operate on
different data.
www.getmyuni.com
Multiple instruction stream, single data stream – MISD
• Theoretical only
• processors receive different instructions, but operate on
same data.
www.getmyuni.com
Multiple instruction stream, multiple data
stream – MIMD
• A computer system capable of processing several programs
at the same time.
• Most multiprocessor and multicomputer systems can be
classified in this category
www.getmyuni.com
Pipelining: Laundry Example
Small laundry has one washer,
one dryer and one operator, it
takes 90 minutes to finish one
load:
Washer takes 30 minutes
Dryer takes 40 minutes
“operator folding” takes 20
minutes
A
B
C
D
Sequential Laundry
www.getmyuni.com
6 PM
7
8
9
10
11
Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a
s
k
O
r
d
e
r
A
B
C
90 min
D
 This operator scheduled his loads to be delivered to the laundry every 90
minutes which is the time required to finish one load. In other words he will
not start a new task unless he is already done with the previous task
 The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined Laundry
Operator start work ASAP
www.getmyuni.com
6 PM
7
8
9
10
11
Midnight
Time
30 40
T
a
s
k
40
40
40 20
40 40 40
A
B
O
r
d
e
r
C
D
 Another operator asks for the delivery of loads to the laundry every 40 minutes!?.
 Pipelined laundry takes 3.5 hours for 4 loads
www.getmyuni.com
Pipelining Facts
6 PM
7
8
9
Time
T
a
s
k
O
r
d
e
r
30 40
A
B
C
D
The washer
waits for the
dryer for 10
minutes
40
40
40 20
Multiple tasks operating
simultaneously
 Pipelining doesn’t help
latency of single task, it
helps throughput of
entire workload
Pipeline rate limited by
slowest pipeline stage
Potential speedup =
Number of pipe stages
Unbalanced lengths of
pipe stages reduces
speedup
Time to “fill” pipeline
and time to “drain” it
reduces speedup
www.getmyuni.com
Pipelining
•Decomposes a sequential process into segments.
•Divide the processor into segment processors each one is
dedicated to a particular segment.
•Each segment is executed in a dedicated segment-processor
operates concurrently with all other segments.
•Information flows through these multiple hardware
segments.
www.getmyuni.com
Pipelining
• Instruction execution is divided into k segments or stages
• Instruction exits pipe stage k-1 and proceeds into pipe
stage k
• All pipe stages take the same amount of time; called one
processor cycle
• Length of the processor cycle is determined by the
slowest pipe stage
k segments
www.getmyuni.com
Pipelining
• Suppose we want to perform the combined multiply and add
operations with a stream of numbers:
• Ai * Bi + Ci for i =1,2,3,…,7
www.getmyuni.com
Pipelining
• The suboperations performed in each segment of the pipeline are as
follows:
• R1  Ai, R2  Bi
• R3  R1 * R2 R4  Ci
• R5  R3 + R4
www.getmyuni.com
www.getmyuni.com
www.getmyuni.com
Pipeline Performance
 n:instructions
 k: stages in pipeline
 : clockcycle
 Tk: total time
n is equivalent to number of loads in
the laundry example
k is the stages (washing, drying and
folding.
Clock cycle is the slowest task time
Tk  (k  (n  1))
T1
nk
Speedup

Tk k  (n  1)
n
k
www.getmyuni.com
SPEEDUP
 • Consider a k-segment pipeline operating on n data sets.
(In the above example, k = 3 and n = 4.)
 > It takes k clock cycles to fill the pipeline and get the first
result from the output of the pipeline.
 After that the remaining (n - 1) results will come out at
each clock cycle.
 > It therefore takes (k + n - 1) clock cycles to complete the
task.
www.getmyuni.com
SPEEDUP
• If we execute the same task sequentially in a single processing unit, it
takes (k * n) clock cycles.
• • The speedup gained by using the pipeline is:
• S = k * n / (k + n - 1 )
www.getmyuni.com
SPEEDUP
S = k * n / (k + n - 1 )
For n >> k (such as 1 million data sets on a 3-stage
pipeline),
S ~ k
So we can gain the speedup which is equal to the
number of functional units for a large data sets.
This is because the multiple functional units can
work in parallel except for the filling and cleaningup cycles.
www.getmyuni.com
Example: 6 tasks, divided into 4 segments
1
2
3
4
5
6
7
8
T1
T2
T3
T4
T5
T6
T1
T2
T3
T4
T5
T6
T1
T2
T3
T4
T5
T6
T1
T2
T3
T4
T5
9
T6
www.getmyuni.com
Reference
Reference Book
• Computer Organization & Architecture 7e By Stallings
• Computer System Architecture By Mano
Download