ELEC 7770 Advanced VLSI Design Spring 2012 Retiming Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr12/course.html Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 1 Retiming Retiming is a function-preserving transformation of a synchronous sequential circuit. Flip-flops are moved according to specific rules. Original references: C. E. Leiserson, F. Rose and J. B. Saxe, “Optimizing Synchronous Circuits by Retiming,” Proc. 3rd Caltech Conf. on VLSI, 1983, pp. 87-116. C. E. Leiserson and J. B. Saxe, “Retiming Synchronous Circuitry,” Algorithmica, vol. 6, pp. 5-35, 1991. Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 2 A Trivial Example: Reduced Hardware FF FF FF Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 3 Example 2: Faster Clock FF FF Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 4 Example 3: Reduced Flip-Flops FF FF FF Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 5 Applications of Retiming Performance optimization Area optimization Power optimization Testability enhancement FPGA optimization Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 6 Fundamental Operation of Retiming A retiming move in a circuit is caused by moving all of the memory elements at the input of a combinational block to all of its outputs, or viceversa. FF Combinational logic ≡ Combinational logic FF FF Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 7 A Correlator Circuit Adder delay = 7 + + + = = = = a1 a2 a3 a4 PO host PI Flip-flop Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) Comparator delay = 3 8 Graph Model f g 0 7 e 0 7 7 0 h 0 0 0 1 3 a 0 0 3 1 b 3 1 c 1 3 d Vertex, vi, combinational, delay = d(vi), assumed unchanged by retiming d(host) = 0 Edge, e(vi,vj) or eij, weight wij = number of flip-flops between vi and vj Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 9 Path Delay and Path Weight A set of connected nodes specify a path. A path does not traverse through the host node. Path delay = ∑ d(vi) = combinational delay of path Path weight = ∑ wij = clock delay of path Retiming of a node i is denoted by an integer ri It represents the number of registers moved across, initially ri = 0 Register moved from output to input, ri → ri + 1 Register moved from input to output, ri → ri – 1 After retiming, edge weight wij’ = wij + rj – ri Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 10 Example of Node Retiming r1 = 0 r2 = 0 r3 = 0 r4 = 0 r5 = 0 r6 =0 3 3 3 3 3 3 ∑ d(vi) = 12, ∑ wij = 0 r1 = 0 r2 = -1 r3 = 0 r4 = 0 r5 = 1 r6 =0 3 3 3 3 3 3 ∑ d(vi) = 12, ∑ wij = 2 Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 11 Legal Retiming Retiming is legal if the retimed circuit has no negative weights. A legally retimed circuit is functionally equivalent to the original circuit – proof by Leiserson and Saxe (1991) Retiming is the most general method for changing the register count and position without knowing the functions of vertices. Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 12 Example a c FF b x d 0 host c 1 x 0 0 Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 13 Example: Illegal Retiming 0 0 host c 0 1 0 x host 0 0 0 0 x 0 → –1 0 →1 Retiming vector = {0, 0, 0} b 1→0 0 → –1 0 a c Retiming vector = {0, 0, –1} c x FF d Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 14 Example: Legal Retiming 0 →1 0 0 host c 1 0 →1 0 0 x host 0 0 0 Retiming vector = {0, 0, 0} c 0 1→0 x 0 0 Retiming vector = {0, 1, 0} FF a c b FF x d Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 15 Correlator Circuit f g 0 7 0 h rh=0 7 rf=0 rg=0 1 3 ra=0 a 0 0 0 0 Critical path delay = 24 e 0 7 re=0 0 3 1 b 3 rb=0 1 c rc=0 1 3 rd=0 d Initial retiming vector = {0,0,0,0,0,0,0,0} Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 16 Retimed Correlator Circuit 0→1 7 0 h rh=0 rg=0 0→1 0 1→0 Critical path delay = 13 e 0→1 7 7 re= -2 rf= -1 0 0 0 f g 3 ra= -1 a 3 1 b 1→0 rb= -1 3 1 c rc= -2 3 rd= -2 d retiming vector = {-1,-1,-2,-2,-2,-1,0,0} Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 17 Retiming Theorem Given a network G(V, E, W) and a cycle time T, (r1, . . . ) is a feasible retiming if and only if: 1. ri – rj ≤ wij for all edges (vi,vj) ε E 2. ri – rj ≤ W(vi,vj) – 1 for all node-pairs vi, vj such that D(vi,vj) > T Where, W(vi,vj): is the minimum weight for all paths between vi and vj D(vi,vj): is the maximum delay among all minimum weight paths between vi and vj Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 18 Proof of Condition 1 We assume that the original network is legal, i.e., all edge weights are positive. For an arbitrary edge (vi,vj) ε E: ri – rj ≤ wij or wij + rj – ri ≥ 0, means that after retiming the new weight wij’ = wij + rj – ri will be positive. Thus, condition 1 ensures the legality of retiming. rj flip-flops ri flip-flops wij flip-flops i j Edge (i,j) Original flip-flops, wij Retimed flip-flops, wij’ = wij + rj – ri ≥ 0 Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 19 Proof of Condition 2 Given: d(vi) < T, for all i. Any retimed path whose combinational delay exceeds clock period, will have at least one flip-flop. The above is the requirement for correct operation. rj flip-flops ri flip-flops Wij flip-flops i j Path (i,j), D(i,j) > T Original weight, Wij Retimed weight, Wij’ = Wij + rj – ri ≥ 1 Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 20 Retiming Optimization Problem Given the initial retiming graph G(V, E, d, w) of a synchronous system and a required clock period P, find a feasible retiming transformation such that for the retimed graph G’ CP(G’) ≤ P Solution: Algorithm 1 – Finds CP(G), critical path of G Algorithm 2 – Finds feasible retiming G → G’ Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 21 Algorithm 1: Critical Path Delay Delete all edges (vi, vj) for which wij ≥ 1. Create a level order for vertices such that an edge (vi, vj) requires order of vj to be higher than that of vi. Traversing all nodes (v) in level order, compute ∆(v) ∆(v) = d(v), if v has no incoming edge ∆(v) = d(v) + max{∆(vi)}, for all incoming edges (vi, v)} i CP(G) = max{∆(vj), for all vertices j} j Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 22 Algorithm 1 Application 0 7 0 h 7 g e a 3 0 b 3 ∆=24 0 g ∆=3 Spring 2012, Feb 10 . . . 3 1 e f ∆=17 1 3 ∆=3 3 d ∆=10 7 0 0 a 1 3 0 7 0 0 c 1 0 7 h 0 0 1 CP(G)=∆=24 7 f 0 0 1 0 0 b 1 3 ∆=3 ELEC 7770: Advanced VLSI Design (Agrawal) c 1 3 d ∆=3 23 Algorithm 2: Retiming for Period = P Initialize retiming variable, r(v) = 0, for all v. Repeat |V| – 1 times: Derive retiming graph. Run Algorithm 1 to determine ∆(v) for all v. For each v such that ∆(v) > P, set r(v) to r(v) + 1. Derive retiming graph and run Algorithm 1: If CP(G) > P, then no feasible retiming exists. Otherwise, CP(G) < P and the retimed graph is the required result. Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 24 Algorithm 2 Application, P = 13 ∆=24 CP(G)=∆=24 h 0 7 0 ∆=3 3 1 0 0 g 7 ∆=7 0 ∆=14 Spring 2012, Feb 10 . . . 1 ∆=3 3 ∆=3 c 3 1 1 1 ∆=10 3 d 7 e 0 1 a 3 ∆=3 f 1 0 0 0 b 3 7 ∆=14 e 0 a ∆=10 7 f 0 ∆=14 h 7 g ∆=17 0 1 ∆=3 0 0 b 1 3 c 1 ∆=3 3 d ∆=3 ELEC 7770: Advanced VLSI Design (Agrawal) 25 Retimed Circuit for P = 13 1 7 0 h rh=0 rg=0 1 0 1→0 Critical path delay = 13 e 1 7 7 re= -2 rf= -1 0 0 0 f g 3 ra= -1 a 3 1 b 0 rb= -1 3 1 c rc= -2 3 rd= -2 d retiming vector = {-1,-1,-2,-2,-2,-1,0,0} Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 26 References Two papers by Leiserson et al. (see slide 2). G. De Micheli, Synthesis and Optimization of Digital Circuits, New York: McGraw-Hill, 1994. N. Maheshwari and S. S. Sapatnekar, Timing Analysis and Optimization of Sequential Circuits, Boston: Springer, 1999. Spring 2012, Feb 10 . . . ELEC 7770: Advanced VLSI Design (Agrawal) 27