Chapter One Introduction to Pipelined Processors

advertisement
Chapter One
Introduction to Pipelined
Processors
Principle of Designing Pipeline
Processors
(Design Problems of Pipeline
Processors)
Data Buffering and Busing
Structures
Speeding up of pipeline segments
• The processing speed of pipeline segments are
usually unequal.
• Consider the example given below:
S1
S2
S3
T1
T2
T3
Speeding up of pipeline segments
• If T1 = T3 = T and T2 = 3T, S2 becomes the
bottleneck and we need to remove it
• How?
• One method is to subdivide the bottleneck
– Two divisions possible are:
Speeding up of pipeline segments
• First Method:
S1
T
S3
T
2T
T
Speeding up of pipeline segments
• First Method:
S1
T
S3
T
2T
T
Speeding up of pipeline segments
• Second Method:
S1
T
S3
T
T
T
T
Speeding up of pipeline segments
• If the bottleneck is not sub-divisible, we can
duplicate S2 in parallel
S2
3T
S1
S2
S3
T
3T
T
S2
3T
Speeding up of pipeline segments
• Control and Synchronization is more complex
in parallel segments
Data Buffering
• Instruction and data buffering provides a
continuous flow to pipeline units
• Example: 4X TI ASC
Example: 4X TI ASC
• In this system it uses a memory buffer unit
(MBU) which
– Supply arithmetic unit with a continuous stream
of operands
– Store results in memory
• The MBU has three double buffers X, Y and Z
(one octet per buffer)
– X,Y for input and Z for output
Example: 4X TI ASC
• This provides pipeline processing at high rate
and alleviate mismatch bandwidth problem
between memory and arithmetic pipeline
Busing Structures
• PBLM: Ideally subfunctions in pipeline should
be independent, else the pipeline must be
halted till dependency is removed.
• SOLN: An efficient internal busing structure.
• Example : TI ASC
Example : TI ASC
• In TI ASC, once instruction dependency is
recognized, update capability is incorporated
by transferring contents of Z buffer to X or Y
buffer.
Internal Data Forwarding and
Register Tagging
Internal Forwarding and Register
Tagging
• Internal Forwarding: It is replacing
unnecessary memory accesses by register-toregister transfers.
• Register Tagging: It is the use of tagged
registers for exploiting concurrent activities
among multiple ALUs.
Internal Forwarding
• Memory access is slower than register-toregister operations.
• Performance can be enhanced by eliminating
unnecessary memory accesses
Internal Forwarding
• This concept can be explored in 3 directions:
1. Store – Load Forwarding
2. Load – Load Forwarding
3. Store – Store Forwarding
Store – Load Forwarding
Load – Load Forwarding
Store – Store Forwarding
Register Tagging
Example : IBM Model 91 :
Floating Point Execution Unit
Example : IBM Model 91-FPU
• The floating point execution unit consists of :
– Data registers
– Transfer paths
– Floating Point Adder Unit
– Multiply-Divide Unit
– Reservation stations
– Common Data Bus
Example : IBM Model 91-FPU
• There are 3 reservation stations for adder
named A1, A2 and A3 and 2 for multipliers
named M1 and M2.
• Each station has the source & sink registers
and their tag & control fields
• The stations hold operands for next execution.
Example : IBM Model 91-FPU
• 3 store data buffers(SDBs) and 4 floating point
registers (FLRs) are tagged
• Busy bits in FLR indicates the dependence of
instructions in subsequent execution
• Common Data Bus(CDB) is to transfer
operands
Example : IBM Model 91-FPU
• There are 11 units to supply information to
CDB: 6 FLBs, 3 adders & 2 multiply/divide unit
• Tags for these stations are :
Unit
Tag
Unit
Tag
FLB1
FLB2
FLB3
0001
0010
0011
ADD1
ADD2
ADD3
1010
1011
1100
FLB4
0100
M1
1000
FLB5
0101
M2
1001
FLB6
0110
Example : IBM Model 91-FPU
• Internal forwarding can be achieved with
tagging scheme on CDB.
• Example:
• Let F refers to FLR and FLBi stands for ith FLB
and their contents be (F) and (FLBi)
• Consider instruction sequence
ADD F,FLB1
F  (F) + (FLB1)
MPY F,FLB2
F  (F) x (FLB2)
Example : IBM Model 91-FPU
• During addition :
– Busy bit of F is set to 1
– Contents of F and FLB1 is sent to adder A1
– Tag of F is set to 1010 (tag of adder)
F
Busy Bit = 1
Tag=1010
Storage Bus
Instruction Unit
6
5
Floating
Point
Buffers
(FLB)
4
Control
3
2
Floating
Point
Operand
Stack(FLOS)
Busy Bit = 1 Tag=1010
Tags
1
Decoder
Tag
Sink
Tag
Sink
1010 F
Tag
Tag
0001
Source
Source
FLB1
CTRL
CTRL
CTRL
Tag Sink
Tag Sink
Adder
Tag
Tag
Source CTRL
Source CTRL
Multiplier
(Common Data Bus)
Store
3
data buffers 2
(SDB)
1
Example : IBM Model 91-FPU
• Meantime, the decode of MPY reveals F is
busy, then
– F should set tag of M1 as 1010 (Tag of adder)
– F should change its tag to 1000 (Tag of Multiplier)
– Send content of FLB2 to M1
F
Busy Bit = 1
Tag=1000
Storage Bus
Instruction Unit
6
5
Floating
Point
Buffers
(FLB)
4
Control
3
2
Floating
Point
Operand
Stack(FLOS)
Busy Bit = 1 Tag=1000
Tags
1
Decoder
Tag Sink Tag Source
Tag Sink Tag Source
Tag Sink Tag Source
CTRL
CTRL
CTRL
Tag Sink Tag
1000 F
0010
Adder
Source CTRL
FLB2 CTRL
Multiplier
(Common Data Bus)
Store
3
data buffers 2
(SDB)
1
Example : IBM Model 91-FPU
• When addition is done, CDB finds that the
result should be sent to M1
• Multiplication is done when both operands
are available
Hazard Detection and Resolution
Hazard Detection and Resolution
• Hazards are caused by resource usage
conflicts among various instructions
• They are triggered by inter-instruction
dependencies
Terminologies:
• Resource Objects: set of working registers,
memory locations and special flags
Hazard Detection and Resolution
• Data Objects: Content of resource objects
• Each Instruction can be considered as a
mapping from a set of data objects to a set of
data objects.
• Domain D(I) : set of resource of objects whose
data objects may affect the execution of
instruction I.
Hazard Detection and Resolution
• Range R(I): set of resource objects whose data
objects may be modified by the execution of
instruction I
• Instruction reads from its domain and writes
in its range
Hazard Detection and Resolution
• Consider execution of instructions I and J, and
J appears immediately after I.
• There are 3 types of data dependent hazards:
1. RAW (Read After Write)
2. WAW(Write After Write)
3. WAR (Write After Write)
RAW (Read After Write)
• The necessary condition for this hazard is
R( I )  D( J )  
RAW (Read After Write)
• Example:
I1 : LOAD r1,a
I2 : ADD r2,r1
• I2 cannot be correctly executed until r1 is
loaded
• Thus I2 is RAW dependent on I1
WAW(Write After Write)
• The necessary condition is
R( I )  R( J )  
WAW(Write After Write)
• Example
I1 : MUL r1, r2
I2 : ADD r1,r4
• Here I1 and I2 writes to same destination and
hence they are said to be WAW dependent.
WAR(Write After Read)
• The necessary condition is
D( I )  R( J )  
WAR(Write After Read)
•
•
•
•
Example:
I1 : MUL r1,r2
I2 : ADD r2,r3
Here I2 has r2 as destination while I1 uses it as
source and hence they are WAR dependent
Hazard Detection and Resolution
• Hazards can be detected in fetch stage by
comparing domain and range.
• Once detected, there are two methods:
1. Generate a warning signal to prevent hazard
2. Allow incoming instruction through pipe and
distribute detection to all pipeline stages.
Download