Distributed Sleep Transistor Network for Power Reduction* Changbo Long ECE Department, UW-Madison clong@cae.wisc.edu Lei He EDA Research Group EE Department, UCLA lhe@ee.ucla.edu *Partially sponsored by NSF CAREER Award 0093273, SRC grant HJ-1008 and Intel Corporation Outline Motivation Background Distributed sleep transistor network (DSTN) Structure, advantages, modeling and sizing algorithm Experiment results Conclusion and future work Motivation Leakage power will become the dominant power component Reduced feature size Increased system integration more idle modules Leakage reduction techniques To reduce leakage for active modules Dual threshold voltage assignment for sub-threshold leakage [Mahesh et-al, ICCAD’02] Pin reordering for gate leakage [Lee et-al, DAC’03] To reduce leakage for idle modules Input vector control [Johnson et-al, DAC’99] Power gating [Kao et-al, DAC’98][Anis-et al, DAC’02] Motivation Vdd Sleep tr. g1 gn Virtual GND Sleep Sleep Sleep tr. PMP System level: use power management processor (PMP) to generate control signals [Mutoh et-al, JSSC’96] PMP can be distributed Gate level: use sleep transistors to turns off power supply Concerned with performance loss and area overhead Performance Loss Vdd ist g1 gn ist Vst = ist R st R st = ( L st 1 )* Wst μnCox (Vdd - VtH ) Performance loss Increase in the propagation delay Performance loss is proportional to Vst ist Maximum Simultaneous Switching Current (MSSC) MSSC MSSC: maximum current in the time domain and the input vector domain Time MSSC g1 g3 g2 t t t t t t t t g1 g3 g2 Input vector ig1 + ig2 + ig3 = itotal Area Overhead Area overhead: the sleep transistor area and the routing area of virtual ground wires Design convention: given performance loss , minimize area overhead Vdd g1 gn MSSC Wst Areast = = k c * MSSC L st kc = 1 δμnCox (Vdd - VtL )(Vdd - VtH ) Related Work Module-based design methodology [Mutoh-et al, JSSC’95 ’96] [Kao-et al, DAC’98] A single and large sleep transistor accommodates entire module [JSSC’96] Manual sizing automatic sizing considering discharge patterns [Kao-et al, DAC’98] Voltage drop on long virtual ground wires is nontrivial, and results in large area Related Work Module-based design methodology [Mutoh-et al, JSSC’95 ’96] [Kao-et al, DAC’98] A single and large sleep transistor accommodates entire module [JSSC’96] Manual sizing automatic sizing considering discharge patterns [Kao-et al, DAC’98] Voltage drop on long virtual ground wires is nontrivial, and results in large area Cluster-based design methodology [Anis-et al, DAC’02] Group gates into clusters and minimize peak current in clusters by clustering algorithms Insert a sleep transistor for each cluster to avoid long virtual ground wires Clustering may conflict with time-driven placement Sleep transistor area Area*: the sleep transistor area ignoring the resistance of virtual ground wires Area*module = k c × MSSCmodule Area* cluster = k c × ∑ MSSCcluster_i i MSSCmodule < ∑iMSSCcluster_i area*module<area*cluster Sleep transistor area Area*: the sleep transistor area ignoring the resistance of virtual ground wires Area*module = k c × MSSCmodule Area* cluster = k c × ∑ MSSCcluster_i i MSSCmodule < ∑iMSSCcluster_i area*module<area*cluster Considering the resistance of virtual ground wires, Areamod > Areaclu [Anis-et al, DAC’02] DSTN has the smallest area AreaDSTN ≈ Area*mod DSTN: Distributed Sleep Transistor Network DSTN enhances cluster-based design by connecting clusters with extra virtual ground wires Cluster-based design DSTN Current Discharging Balance Reduces Size Cluster-based design DSTN Cluster-based design Current discharges by its private sleep transistor large transistor size DSTN Current discharges by both private and neighboring sleep transistors small transistor size Additional Advantages of DSTN DSTN introduces NO constraint on placement Wire overhead of DSTN is small Sleep tr. Cluster-based design Sleep tr. DSTN Modeling of DSTN Switching current Ri Rst Entire module resistance network plus current source DSTN Sizing Problem DSTN Sizing Problem (DSTN/SP) Given DSTN topology, DSTN/SP finds the size for every sleep transistor such that the total transistor area of DSTN is minimized and the performance loss constraint is satisfied for every cluster PL< W=? W=? Switching current Rst=? Vst<ε Rst=? Vst<ε PL< W=? W=? Rst=? Vst<ε Rst=? Vst<ε Difficulties of DSTN/SP Primary challenge: current source Dependency between the current sources Current varies w.r.t. time Secondary challenge: resistance network Given current source, size Rst to minimize transistor area while satisfy performance loss constraints Does any algorithms exist in the literature? No exact solution Close solution for Power/Ground network sizing [Boyd, et-al ISPD’01] We have developed an algorithm based on special properties of DSTN/SP Properties of DSTN/SP Solutions P1: Assuming Ri=0, AreaDSTN = Area*module = kc * MSSCmod 1 kc = δμnCox (Vdd - VtL )(Vdd - VtH ) : Performance loss constraint, MSSC: Maximum current Properties of DSTN/SP Solutions P2: given current source, AreaDSTN increases when Ri increases The increase is limited because Ri << Rst Ri=∞, AreaDSTN=Areacluster Properties of DSTN/SP Solutions P3: Assuming cluster current and AreaDSTN to be constant, to achieve minimum performance loss, Are acluster_i = M SSCcluster_i ∑M SSC i cluster_i * Are aDSTN Algorithm for DSTN/SP P1, P2: Total sleep transistor area of DSTN is determined by AreaDSTN = (1+ β) * kc * MSSCmodule [0.05, 0.5], empirical parameter increases when Ri increases P3: Size of each individual sleep transistor is Areacluster_i = M SSCcluster_i ∑M SSC i * AreaDSTN cluster_i Key is to estimate MSSCmodule and MSSCcluster Maximum Current Estimation Estimate MSSCmodule Circuit current strongly depends on input vector The space of input vector increase exponentially with the number of primary input Genetic algorithm (GA) based algorithm is used [Jiang et-al, TVLSI’00] Efficient algorithm to estimate MSSCcluster has been proposed in the paper Base-line Case: Cluster-based Design Cluster-based design without considering placement constraint Clustering algorithm Given a circuit and cluster size, partition gates into clusters such that ∑i MSSCcluster_i is minimized and Areacluster is minimized in turn Simulated Annealing (SA) Sizing algorithm Each individual sleep transistor Areacluster_i = kc * MSSCcluster_i Total area Area cluster = k c * ∑ MSSCcluster_i i Experiment Setup Gate level synthesis Sizing Estimate maximum current for clusters and the entire module Apply the sizing algorithms Verification Simulate the circuit and obtain the current source by 10,000 random input vectors Obtain performance loss by solving the resistance network with circuit KCL and KVL equations Find the maximum performance loss among the performance loss for each input vector Custom layout Implement a four-bit CLA using 0.35μm technology Determine size by SPICE simulation Cluster-based design: each cluster satisfy the performance loss constraint DSTN: the entire module satisfy the performance loss constraint Result of Gate Level Synthesis Cluster-based DSTN W/L of Sleep Transistors Maximum Performance Loss On average, DSTN reduces total W/L by 49.8% with smaller performance loss Custom Layout in 0.35μm Cluster-based design Sleep transistors Each cluster is accommodated by a sleep transistor DSTN Sleep transistors Sleep transistors are connected by virtual ground wires Virtual ground wires Custom Layout Comparison No sleep transistor Cluster-based DSTN Leakage current delay Sleep tr. Total area Area DSTN reduces runtime leakage by 50x and 5x compared to no sleep transistor and cluster-based design, respectively DSTN reduces sleep transistor area by 6.83x with 6.6% smaller performance degradation compared to the cluster-based design Conclusion and Future Work We have proposed DSTN and the sizing algorithm DSTN has reduced area, less leakage current and supply voltage drop Future work Ideal power/ground network is assumed in this paper Investigate the co-design of DSTN and the power/ground network