Power and Slew-aware Clock Network Design for ThroughSilicon-Via (TSV) based 3D ICs Xin Zhao and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia, U.S.A. Reference the slides of this paper in ASPDAC 2010 Outline Motivation and contribution Modeling and synthesis of 3D clock tree Experimental result and discussions Conclusion Outline Motivation and contribution Modeling and synthesis of 3D clock tree Experimental result and discussions Conclusion Motivation Clock skew is required to be less than 3%-4% of the clock period in an aggressive clock network design according to ITRS projection. Driving large capacitive load and switches at a high frequency leads to an increasingly large proportion of the total power of a system dissipated in the clock distribution network Motivation TSV provides the vertical interconnection to deliver the clock signal to all dies in the 3D stack In general, the total wirelength of the 3D clock network decreases significantly if more TSVs are used Too many TSVs often cause routing congestion and yield reduction problem Contribution Three major goals Using SPICE to simulate the result • Clock skew minimization • Clock slew control • Clock power reduction Contribution Investigate the impact of design techniques on 3D clock network Outline Motivation and contribution Modeling and synthesis of 3D clock tree Experimental result and discussions Conclusion Electrical model Electrical model • Wire • TSVs • Clock buffer TSV usage TSV upper bound • Maximum number of TSVs allowed between • adjacent dies According to yielding and routing reason TSV count • The number of TSV be used actually • Using stacked-TSV Simple sample clock tree Problem formulation Input Output • Sink set (N dies), clock source location • Upper bound of TSV usage • Slew constraint • Zero-Elmore-skew 3D clock tree Problem formulation Object Constraint • Zero-Elmore-skew • Minimize wirelength, clock power • Maximum slew • Upper bound of TSV usage 3D clock routing algorithm-flow What is MMM? Method of Means and Medians • Jackson, Srinivasan, Kuh, “Clock routing for high-performance ICs,” DAC, 1990. Each clock pin is represented as a point in the region, S. The region is partitioned into two subregions, SL and SR. The center of mass is computed for each subregion. The center of mass of the region S is connected to each of the centers of mass of subregion SL and SR. The subregions SL and SR are then recursively split in Ydirection. The above steps are repeated with alternate splitting in X- and Y-direction. Time complexity: O(n log n). An MMM example 3D abstract tree 3D abstract tree (cont.) A N-colored binary tree 3D-MMM Suppose TSV upper bound is 3 and the clock source is on die-0 3D-MMM Slew-aware buffering, merging and embedding Merging and slew-aware buffering, embedding • 3D clock tree with multiple TSVs • Using deferred-merge embedding, DME Slew-aware buffering, merging and embedding (cont.) For the purpose of That is • Skew controlling • Shorter wire length • Zero skew in Elmore delay model • Minimize the clock power consumption 3D clock tree Unique property of 3D clock tree • A complete tree + many sub-trees About clock source location In general, locating on middle die can make #TSVs and wirelength less About clock source location – Theoretical max TSV usage Suppose M clock sinks evenly distribute on N dies and clock source location is die-s Outline Motivation and contribution Modeling and synthesis of 3D clock tree Experimental result and discussions Conclusion Experimental result Sample clock tree of IBM r5 benchmark in 6-die With TSV upper bound : 20 Impact of TSV bound Point A: 20% power saving, TSV bound ≥70% of #sinks die TSV bound and slew distribution r5, six-die CMAX=300fF [11.4ps, 86.2ps] Avg. 53.9ps #Bufs: 2933 [10.9ps, 79.6ps] Avg. 42.6ps #Bufs: 2638 Multi-TSV vs. Single-TSV: 4-die stack Multi-TSV vs. Single-TSV: 6-die stack Skew comparison CMAX and slew Using single TSV Using multiple TSVs Impact of clock source location on power and wirelength A uses 33% fewer TSVs than B Statistics of TSVs number and the clock source location #TSVs = 3720 #TSVs = 2791 Outline Motivation and contribution Modeling and synthesis of 3D clock tree Experimental result and discussions Conclusion Conclusions Provided SPICE simulation information Using multiple TSVs helps to reduce wirelength and power. Multi-TSV also has better control on slew variations Smaller CMAX efficiently lowers the clock slew Clock source location also affects the 3D clock network in a significant way: placing the clock source on the middle die helps reducing slew and TSV usage under the same power budgets