Pipeline Design Problems

advertisement
Pipeline Design Problems
Job Sequencing and Collision
Prevention for the Design of
Static Pipeline
Job Sequencing and Collision
Prevention
• Consider reservation table given below at t=0
0
Sa
Sb
Sc
1
2
3
4
A
5
A
A
A
A
A
Job Sequencing and Collision
Prevention
• Consider next initiation made at t=1
0
Sa A1
Sb
Sc
1
A2
A1
2
3
4
A2
A1
A1
A2
A2
A1
5
A1
6
A2
7
A2
• The second initiation easily fits in the
reservation table
Job Sequencing and Collision
Prevention
• Now consider the case when first initiation is
made at t = 0 and second at t = 2.
0 1 2
3
4
5 6 7
Sa A1
A2
A1
A2
Sb
A 1 A 2 A 1A 2
A2
Sc
A 1 A 2 A 1A 2
A2
• Here both markings A1 and A2 falls in the
same stage time units and is called collision
and it must be avoided
Terminologies
Terminologies
• Latency: Time difference between two
initiations in units of clock period
• Forbidden Latency: Latencies resulting in
collision
• Forbidden Latency Set: Set of all forbidden
latencies
General Method of finding Latency
Considering all initiations:
0 1 2
3
4
5
6
7
Sa A1 A2 A3 A4
A 5 A 6A 1 A 2
A3
Sb
A 1 A 2 A 1A 3 A 2 A 4 A 3A 5 A 4A 6 A 5
Sc
A 1 A 2 A 1 A 3 A 2A 4 A 3A 5 A 4A 6
• Forbidden Latencies are 2 and 5
8 9 10
A4 A5 A6
A6
A5 A6
Shortcut Method of finding Latency
• Forbidden Latency Set = {0,5} U {0,2} U {0,2}
= { 0, 2, 5 }
Terminologies
• Initiation Sequence : Sequence of time units
at which initiation can be made without
causing collision
• Example : { 0,1,3,4 ….}
• Latency Sequence : Sequence of latencies
between successive initiations
• Example : { 1,2,1….}
• For a RT, number of valid initiations and
latencies are infinite
Terminologies
• Initiation Rate :
– The average number of initiations done per unit
time
– It is a positive fraction and maximum value of IR is 1
• Average Latency : The average of latency of a
given latency sequence
IR = 1/AL
Terminologies
• Latency Cycle:
• Among the infinite possible latency sequence,
the periodic ones are significant.
E.g. { 2, 3, 4, 2, 3, 4,… }
• The subsequence that repeats itself is called
latency cycle.
E.g. {2, 3, 4}
Terminologies
• Period of cycle: The sum of latencies in a
latency cycle (2+3+4=9)
• Average Latency: The average taken over its
latency cycle (AL=9/3=3)
• To design a pipeline, we need a control
strategy that maximize the throughput (no. of
results per unit time)
• Maximizing throughput is minimizing AL
Terminologies
• Control Strategy
– Initiate pipeline as specified by latency sequence.
– Latency sequence which is aperiodic in nature is
impossible to design
• Thus design problem is arriving at a latency
cycle having minimal average latency.
Terminologies
• Stage Utilization Factor (SUF):
• SUF of a particular stage is the fraction of time units
the stage used while following a latency sequence.
• Example: Consider 5 initiations of function A
as below
Sa
Sb
Sc
0
A1
1 2 3 4 5 6 7
A2
A3 A1 A2 A4
A1 A2 A1 A2 A3
A3
A1 A2 A1 A2 A3
8
A5
A4
A3
9 10 11 12 13
A3
A4 A5
A5 A4 A 5
A4 A5 A 4 A5
Terminologies
• SUF of stage Sa is number of markings present
along Sa divided by the time interval over
which marking is counted.
• SUF(Sa) = SUF(Sb) = SUF(Sc) = 10/14
Terminologies
• Let SU(i) be the stage utilization factor of stage i
• Let N(i) be no. of markings against stage i in the
reservation table
• Suppose we initiate pipeline with initiation rate
(IR), then SU(i) is given by
SU(i) 
No.of initiations made overa given period x N(i)
Durationof period
SUF
SU(i) 
No.of initiations made overa given period x N(i)
Durationof period
5x2
SU(a) 
14
Terminologies
• Minimum Average Latency (MAL)
• Thus SU(i) = IR x N(i)
• SU(i) ≤ 1  IR x N(i) ≤ 1
N(i) ≤ 1/IR  N(i) ≤ AL
• Therefore
MAL  maxN (i)
k
i1
State Diagram
• Suppose a pipeline is initially empty and make
an initiation at t = 0.
• Now we need to check whether an initiation
possible at t=i for i > 0.
• bi is used to note possibility of initiation
• bi = 1  initiation not possible
• bi = 0  initiation possible
State Diagram
bi
1
0
1
0
0
1
State Diagram
• The above binary representation (binary vector)
is called collision vector(CV)
• The collision vector obtained made at first
initiation is called initial collision vector(ICV)
ICVA = (101001)
• The graphical representation of states (CVs) that
a pipeline can reach and the relation is given by
state diagram
State Diagram
• States (CVs) are denoted by nodes
• The node representing CVt-1 is connected to
CVt by a directed graph from CVt-1 to CVt and
similarly for CVt* with a * on arc
Procedure to draw state diagram
1. Start with ICV
2. For each unprocessed state, say CVt-1, do as
follows:
a) Find CVt from CVt-1 by the following steps
1. Left shift CVt-1 by 1 bit
2. Drop the leftmost bit
3. Append the bit 0 at the right-hand end
Procedure to draw state diagram
b) If the 0th bit of CVt is 0, then obtain CV* by
logically ORing CVt with ICV.
c) Make a new node for CVt and join with CVt-1
with an arc if the state CVt does not already
exist.
d) If CV* exists, repeat step (c), but mark the arc
with a *.
State Diagram
101001
State Diagram
Left Shift
101001
010010
State Diagram
Zero  CV*
exists
101001
010010
State Diagram
101001
*
010010
111011
ICV – 101001
CVi – 010010
CV* 111011
OR
State Diagram
101001
*
Left Shift
010010
111011
Left Shift
No CV*
No CV*
100100
110110
State Diagram
101001
*
010010
Left
Shift
111011
*
Zero 
CV* exists
100100
110110
Left Shift
No CV*
001000
101100
ICV – 101001 OR
CVi – 001000
CV* 101001
State Diagram
101001
*
010010
111011
*
100100
101100
001000
010000
*
Zero 
CV* exists
110110
111001
ICV – 101001
CVi – 010000
CV* 111001
101001
*
*
010010
111011
100100
010000
111001
*
001000
110110
101100
Zero 
CV* exists
011000
ICV – 101001
CVi – 011000
CV* 111001
101001
*
*
010010
111011
100100
*
010000
*
001000
110110
101100
011000
111001
No CV*
110000
101001
*
*
010010
111011
100100
*
010000
*
001000
110110
101100
011000
111001
110000
No CV*
100000
101001
*
*
010010
111011
100100
*
010000
111001
*
001000
110110
101100
011000
110000
100000
000000
*
*
101001
*
010010
*
111011
100100
010000
111001
*
001000
110110
101100
011000
110000
*
100000
000000
*
101001
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
State Diagram
• From the above diagram, closed loops can be
identified as latency cycles.
• To find the latency corresponding to a loop, start
with any initial * count the number of states
before we encounter another * and reach back
to initial *.
101001
Latency = (3)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (1,3,3)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (4,3)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (1,6)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (1,7)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (4)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (6)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
101001
Latency = (7)
*
010010
*
111011
100100
*
001000
110110
101100
*
010000
111001
011000
110010
110000
*
100000
000000
*
State Diagram
• The state with all zeros has a self-loop which
corresponds to empty pipeline and it is possible
to wait for indefinite number of latency cycles of
the form (1,8), (1,9),(1,10) etc.
• Simple Cycle: latency cycle in which each state is
encountered only once.
• Complex Cycle: consists of more than one
simple cycle in it.
• It is enough to look for simple cycles
State Diagram
• In the above example, the cycle that offers MAL
is (1, 3, 3)
• From
MAL  maxN (i)  2
k
i 1
• A cycle arrived so is called greedy cycle, which
minimize latency between successive initiation
Modified State Diagram
• The state diagram becomes cumbersome for
longer ICVs.
• In modified state diagrams, we represent only
states obtained of initiations.
Modified State Diagram
• The procedure is as follows:
1. Start with the ICV
2. For each unprocessed state,
For each bit I in the CVi which is 0, do the
following:
a. Shift CVi left by i bits
b. Drop i leftmost bits
Modified State Diagram
c. Append zeros to right
d. Logically OR with ICV
e. If step(d) results in a new state then form a
new node for this state and join it with node
of CVi by an arc with a marking i. Join this
new node with node of ICV with an arc
having the marking ≥ d (length of ICV)
Modified State Diagram
101001
Modified State Diagram
101001
1
111011
i =1
ICV – 101001
CVi – 010010
CV* 111011
OR
Modified State Diagram
101001
≥6
1
111011
Modified State Diagram
101001
≥6
1
111011
i =3
ICV – 101001
CVi – 001000
CV* 101001
OR
Modified State Diagram
3
101001
≥6
1
111011
i = 3
Modified State Diagram
3
101001
≥6
i =4
1
111011
ICV – 101001
CVi – 010000
CV* 111001
OR
Modified State Diagram
3
101001
≥6
4
1
111011
111001
ICV – 101001
CVi – 010000
CV* 111001
OR
Modified State Diagram
3
101001
≥6
4
≥6
1
111011
111001
Modified State Diagram
3
≥6
101001
≥6
4
≥6
1
111011
111001
Modified State Diagram
3
≥6
101001
≥6
4
≥6
1
111011
ICV – 101001
CVi – 011000
CV* 111001
111001
i =3
OR
Modified State Diagram
3
≥6
101001
≥6
4
≥6
1
111011
3
111001
Modified State Diagram
3
≥6
101001
≥6
4
≥6
1
111011
3
111001
i =3
ICV – 101001
CVi – 001000
CV* 101001
OR
Modified State Diagram
3
≥6
101001
≥6
≥6
4
3
1
111011
3
111001
Modified State Diagram
3
≥6
101001
≥6
≥6
4
3
1
111011
3
111001
i =4
ICV – 101001
CVi – 010000
CV* 111001
OR
Modified State Diagram
3
≥6
101001
≥6
≥6
4
3
1
111011
3
111001
4
Dynamic Pipeline and
Reconfigurability
• Two methods to improve the throughput of
dynamic pipeline:
– Insertion of non-compute delays
– Use of Internal Buffers
End
Download