Pipeline Design Problems Job Sequencing and Collision Prevention for the Design of Static Pipeline Job Sequencing and Collision Prevention • Consider reservation table given below at t=0 0 Sa Sb Sc 1 2 3 4 A 5 A A A A A Job Sequencing and Collision Prevention • Consider next initiation made at t=1 0 Sa A1 Sb Sc 1 A2 A1 2 3 4 A2 A1 A1 A2 A2 A1 5 A1 6 A2 7 A2 • The second initiation easily fits in the reservation table Job Sequencing and Collision Prevention • Now consider the case when first initiation is made at t = 0 and second at t = 2. 0 1 2 3 4 5 6 7 Sa A1 A2 A1 A2 Sb A 1 A 2 A 1A 2 A2 Sc A 1 A 2 A 1A 2 A2 • Here both markings A1 and A2 falls in the same stage time units and is called collision and it must be avoided Terminologies Terminologies • Latency: Time difference between two initiations in units of clock period • Forbidden Latency: Latencies resulting in collision • Forbidden Latency Set: Set of all forbidden latencies General Method of finding Latency Considering all initiations: 0 1 2 3 4 5 6 7 Sa A1 A2 A3 A4 A 5 A 6A 1 A 2 A3 Sb A 1 A 2 A 1A 3 A 2 A 4 A 3A 5 A 4A 6 A 5 Sc A 1 A 2 A 1 A 3 A 2A 4 A 3A 5 A 4A 6 • Forbidden Latencies are 2 and 5 8 9 10 A4 A5 A6 A6 A5 A6 Shortcut Method of finding Latency • Forbidden Latency Set = {0,5} U {0,2} U {0,2} = { 0, 2, 5 } Terminologies • Initiation Sequence : Sequence of time units at which initiation can be made without causing collision • Example : { 0,1,3,4 ….} • Latency Sequence : Sequence of latencies between successive initiations • Example : { 1,2,1….} • For a RT, number of valid initiations and latencies are infinite Terminologies • Initiation Rate : – The average number of initiations done per unit time – It is a positive fraction and maximum value of IR is 1 • Average Latency : The average of latency of a given latency sequence IR = 1/AL Terminologies • Latency Cycle: • Among the infinite possible latency sequence, the periodic ones are significant. E.g. { 2, 3, 4, 2, 3, 4,… } • The subsequence that repeats itself is called latency cycle. E.g. {2, 3, 4} Terminologies • Period of cycle: The sum of latencies in a latency cycle (2+3+4=9) • Average Latency: The average taken over its latency cycle (AL=9/3=3) • To design a pipeline, we need a control strategy that maximize the throughput (no. of results per unit time) • Maximizing throughput is minimizing AL Terminologies • Control Strategy – Initiate pipeline as specified by latency sequence. – Latency sequence which is aperiodic in nature is impossible to design • Thus design problem is arriving at a latency cycle having minimal average latency. Terminologies • Stage Utilization Factor (SUF): • SUF of a particular stage is the fraction of time units the stage used while following a latency sequence. • Example: Consider 5 initiations of function A as below Sa Sb Sc 0 A1 1 2 3 4 5 6 7 A2 A3 A1 A2 A4 A1 A2 A1 A2 A3 A3 A1 A2 A1 A2 A3 8 A5 A4 A3 9 10 11 12 13 A3 A4 A5 A5 A4 A 5 A4 A5 A 4 A5 Terminologies • SUF of stage Sa is number of markings present along Sa divided by the time interval over which marking is counted. • SUF(Sa) = SUF(Sb) = SUF(Sc) = 10/14 Terminologies • Let SU(i) be the stage utilization factor of stage i • Let N(i) be no. of markings against stage i in the reservation table • Suppose we initiate pipeline with initiation rate (IR), then SU(i) is given by SU(i) No.of initiations made overa given period x N(i) Durationof period SUF SU(i) No.of initiations made overa given period x N(i) Durationof period 5x2 SU(a) 14 Terminologies • Minimum Average Latency (MAL) • Thus SU(i) = IR x N(i) • SU(i) ≤ 1 IR x N(i) ≤ 1 N(i) ≤ 1/IR N(i) ≤ AL • Therefore MAL maxN (i) k i1 State Diagram • Suppose a pipeline is initially empty and make an initiation at t = 0. • Now we need to check whether an initiation possible at t=i for i > 0. • bi is used to note possibility of initiation • bi = 1 initiation not possible • bi = 0 initiation possible State Diagram bi 1 0 1 0 0 1 State Diagram • The above binary representation (binary vector) is called collision vector(CV) • The collision vector obtained made at first initiation is called initial collision vector(ICV) ICVA = (101001) • The graphical representation of states (CVs) that a pipeline can reach and the relation is given by state diagram State Diagram • States (CVs) are denoted by nodes • The node representing CVt-1 is connected to CVt by a directed graph from CVt-1 to CVt and similarly for CVt* with a * on arc Procedure to draw state diagram 1. Start with ICV 2. For each unprocessed state, say CVt-1, do as follows: a) Find CVt from CVt-1 by the following steps 1. Left shift CVt-1 by 1 bit 2. Drop the leftmost bit 3. Append the bit 0 at the right-hand end Procedure to draw state diagram b) If the 0th bit of CVt is 0, then obtain CV* by logically ORing CVt with ICV. c) Make a new node for CVt and join with CVt-1 with an arc if the state CVt does not already exist. d) If CV* exists, repeat step (c), but mark the arc with a *. State Diagram 101001 State Diagram Left Shift 101001 010010 State Diagram Zero CV* exists 101001 010010 State Diagram 101001 * 010010 111011 ICV – 101001 CVi – 010010 CV* 111011 OR State Diagram 101001 * Left Shift 010010 111011 Left Shift No CV* No CV* 100100 110110 State Diagram 101001 * 010010 Left Shift 111011 * Zero CV* exists 100100 110110 Left Shift No CV* 001000 101100 ICV – 101001 OR CVi – 001000 CV* 101001 State Diagram 101001 * 010010 111011 * 100100 101100 001000 010000 * Zero CV* exists 110110 111001 ICV – 101001 CVi – 010000 CV* 111001 101001 * * 010010 111011 100100 010000 111001 * 001000 110110 101100 Zero CV* exists 011000 ICV – 101001 CVi – 011000 CV* 111001 101001 * * 010010 111011 100100 * 010000 * 001000 110110 101100 011000 111001 No CV* 110000 101001 * * 010010 111011 100100 * 010000 * 001000 110110 101100 011000 111001 110000 No CV* 100000 101001 * * 010010 111011 100100 * 010000 111001 * 001000 110110 101100 011000 110000 100000 000000 * * 101001 * 010010 * 111011 100100 010000 111001 * 001000 110110 101100 011000 110000 * 100000 000000 * 101001 * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * State Diagram • From the above diagram, closed loops can be identified as latency cycles. • To find the latency corresponding to a loop, start with any initial * count the number of states before we encounter another * and reach back to initial *. 101001 Latency = (3) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (1,3,3) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (4,3) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (1,6) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (1,7) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (4) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (6) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * 101001 Latency = (7) * 010010 * 111011 100100 * 001000 110110 101100 * 010000 111001 011000 110010 110000 * 100000 000000 * State Diagram • The state with all zeros has a self-loop which corresponds to empty pipeline and it is possible to wait for indefinite number of latency cycles of the form (1,8), (1,9),(1,10) etc. • Simple Cycle: latency cycle in which each state is encountered only once. • Complex Cycle: consists of more than one simple cycle in it. • It is enough to look for simple cycles State Diagram • In the above example, the cycle that offers MAL is (1, 3, 3) • From MAL maxN (i) 2 k i 1 • A cycle arrived so is called greedy cycle, which minimize latency between successive initiation Modified State Diagram • The state diagram becomes cumbersome for longer ICVs. • In modified state diagrams, we represent only states obtained of initiations. Modified State Diagram • The procedure is as follows: 1. Start with the ICV 2. For each unprocessed state, For each bit I in the CVi which is 0, do the following: a. Shift CVi left by i bits b. Drop i leftmost bits Modified State Diagram c. Append zeros to right d. Logically OR with ICV e. If step(d) results in a new state then form a new node for this state and join it with node of CVi by an arc with a marking i. Join this new node with node of ICV with an arc having the marking ≥ d (length of ICV) Modified State Diagram 101001 Modified State Diagram 101001 1 111011 i =1 ICV – 101001 CVi – 010010 CV* 111011 OR Modified State Diagram 101001 ≥6 1 111011 Modified State Diagram 101001 ≥6 1 111011 i =3 ICV – 101001 CVi – 001000 CV* 101001 OR Modified State Diagram 3 101001 ≥6 1 111011 i = 3 Modified State Diagram 3 101001 ≥6 i =4 1 111011 ICV – 101001 CVi – 010000 CV* 111001 OR Modified State Diagram 3 101001 ≥6 4 1 111011 111001 ICV – 101001 CVi – 010000 CV* 111001 OR Modified State Diagram 3 101001 ≥6 4 ≥6 1 111011 111001 Modified State Diagram 3 ≥6 101001 ≥6 4 ≥6 1 111011 111001 Modified State Diagram 3 ≥6 101001 ≥6 4 ≥6 1 111011 ICV – 101001 CVi – 011000 CV* 111001 111001 i =3 OR Modified State Diagram 3 ≥6 101001 ≥6 4 ≥6 1 111011 3 111001 Modified State Diagram 3 ≥6 101001 ≥6 4 ≥6 1 111011 3 111001 i =3 ICV – 101001 CVi – 001000 CV* 101001 OR Modified State Diagram 3 ≥6 101001 ≥6 ≥6 4 3 1 111011 3 111001 Modified State Diagram 3 ≥6 101001 ≥6 ≥6 4 3 1 111011 3 111001 i =4 ICV – 101001 CVi – 010000 CV* 111001 OR Modified State Diagram 3 ≥6 101001 ≥6 ≥6 4 3 1 111011 3 111001 4 Dynamic Pipeline and Reconfigurability • Two methods to improve the throughput of dynamic pipeline: – Insertion of non-compute delays – Use of Internal Buffers End