IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017) Timing Convergence Techniques in Digital VLSI Designs Linumon Thomas Kiran V Dept. of Electronics and Communication RVCE, Bengaluru, Karnataka, India Dept. of Electronics and Communication RVCE, Bengaluru, Karnataka, India Abstract—Primary goal of every design is to improve performance of the system. In digital designs increase in frequency is very important with each versions of design. As frequency increases number of negative paths in a circuit will become more, after static timing analysis. Techniques for making negative margin paths to positive margin are explained here. Through cell resizing, placement optimization, clock tuning and routing optimization timing closure can be achieved. Keywords-Static timing analysis,timing tuning,placement and routing optimizations convergence,clock I. INTRODUCTION be inserted. These cells are slow because of high threshold voltage and can be helpful in hold violation fixes. B. Clock Optimization Clock optimization generally known as clock tuning can be used for timing convergence. If a path is having setup violation, it can be fixed by clock pushing. Clock pushing is the process of adding extra clock buffers to the clock network. Clock pushing can also be achieved by delaying clock cells. Setup violation occurs when clock at a latch is fast and it cannot meet setup condition. Target of all digital designs is to improve the performance of the system. Increase in frequency of operation is a major parameter for performance improvement. As maximum frequency of operation of a circuit increases, number of paths with negative margin increases. Designer will have the job to make all the paths with positive margin. Timing convergence can be obtained by proper choice of cells, clock tuning, placement optimization and routing optimization. These techniques are explained in the following section. II. TIMING CONVERGENCE TECHNIQUES A. Proper choice of standard library cells Each process technology comes with a set of standard cells from which designer can choose standard cells based on requirements. Each cell will be available in different flavors and sizes. Proper selection of size will help in timing convergence. Upsizing and downsizing can be helpful in timing convergence. If a cell is upsized the cell delay will become less and slope also will be improved. When this happens total path delay will be less. If a path is having setup margin violation, upsizing reduces path delay and it helps in rectifying timing violation. Similarly if a path is having hold violation, downsizing increases cell delay and overall path delay. So downsizing of a cell helps in fixing hold violations. In standard cell library there will be cells with high threshold voltage. In paths with hold violation, these cells can Figure 2.1: Setup convergence by clock pushing In Figure 2.1 Flip Flop FF2 is having setup violation. In order to fix the violation clock is pushed as shown. It is achieved by adding clock buffer CLK BUF2 in the clock network. It delays the clock at FF2 and gives required setup margin. In paths, with hold violation clock pulling can be implemented to fix the paths. Clock pulling is the process of decreasing clock delay. Assume that in Figure 2.1, FF2 is having hold violation. It can be fixed by making the clock early at FF2.This can be achieved by removing CLK BUF2 or decreasing cell delay of CLK BUF2 in Figure 2.1.Clock 978-1-5386-0814-2/17/$31.00 ©2017 IEEE 2882 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on November 17,2023 at 18:39:15 UTC from IEEE Xplore. Restrictions apply. IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017) pulling employed on Figure 2.1 to fix hold violation at FF2 is shown in Figure 2.2. around the driver. It ensures proper load balancing and less path delays. Figure 2.2: Hold violation Fix by removing CLK BUF2 C. Optimization of placement of cells One of the major aspect which decides timing of circuit paths is placement of cells. Optimum placement of cells is very important in terms of power and timing convergence. In order to reduce path delays, drivers and receivers should be placed at minimum possible distance. If a driver is driving more than one receiver as shown in Figure 2.3, Driver should be placed at center of configuration with receivers symmetric and equidistant to achieve proper load balancing. Figure 2.4: Optimum placement of a driver and receivers When cell blocks are placed as close as possible, net delay between the cells decreases and total delay in paths will become less. Lesser delay helps in achieving timing closure. Another example of bad placement is shown in Figure 2.5.Here the driver is driving only one receiver and the receiver drives another receiver Receiver2.Driver and Receiver 1 should be kept nearby. It helps in reducing net delay. Metal resources used, can also be reduced. Figure 2.3: Bad placement of blocks Figure 2.3 shows a configuration in which a driver is driving 3 receivers. This is a bad placement example. Receivers are concentrated at one end and driver is at opposite end. Optimum placement for the structure is shown in Figure 2.4.Here driver is kept at center with receivers spreading Figure 2.5: Non optimum placement Ideal placement for the above configuration can be as shown in Figure 2.6.Driver and Receiver1 are placed closer 2883 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on November 17,2023 at 18:39:15 UTC from IEEE Xplore. Restrictions apply. IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017) and Receiver 2 is placed near Receiver 1.Total net length will be half. Figure 2.8: Optimum Routing Figure 2.6: Ideal placement of blocks D. Optimization of Routing Routing decides actual net delay and thereby timing of digital circuits. Optimum utilization of metal resources is very important. Higher metal layers with higher speed should not be used for shorter distances. If higher metal layers are used for shorter distances, it will create resource congestion for critical paths. Normal digital circuits have drivers with up to 32 fanouts. Optimum metal allocation and routing is very important in capacitance load balancing. Interconnect capacitance decides net delay. Thus for timing convergence optimum metal allocation and routing are very important. Figure 2.9: Fish bone routing configuration Figure 2.7: Bad Routing example In Figure 2.7, a Driver is driving Receiver 1.Placement of both the blocks are nearby. The routing strategy used here is not optimum. Care should be taken to minimize routing length and number of metal layers. Here 3 metal layers are used because of bad routing.it can be optimized as shown in Figure 2.8.Here the routing length is less than first case .Number of metal layers used are 2.This is an optimum routing scenario. Saving in metal layer for short distances gives liberty of more resources at critical paths where timing convergence is difficult. A routing configuration known as fish bone routing is shown in Figure 2.9.Here a Driver is driving 4 Receivers.Driver is located at the centre and Receivers are placed eqyuidistant from the Driver.Routing layer known as Trunk runs from Driver to Receivers.The last minute connection between Trunk and Receivers are given by lower metal layers known as spines.When a Driver drives multiple Receivers this configuration can be used. III RESULTS A module inside microprocessor, called Broadcast module is designed and timing closure techniques are applied 2884 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on November 17,2023 at 18:39:15 UTC from IEEE Xplore. Restrictions apply. IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017) Table 3.1: Comparison of “Reference” and “Current” design for setup margin Margin(Pico Number of paths Number of paths second) in Reference in Current design on it for an enhanced frequency of operation. Result of the design is explained below. Design starts with a design known as “Reference” and the final design is known as “Current” design. Setup Margin Static timing analysis is done for the designed circuit to identify timing violations and to fix the paths. Figure 3.1 shows a bar chart with distribution of number of paths against margin bucket. This is for the “Reference” design [-50 < X < -25 ] 26 0 [-25 < X < 0 ] 347 9 [0 < X < 25 ] 1594 320 [25 < X <50 ] 3493 1672 Hold Margin Hold margin for paths are also important similar to setup paths. Hold violations should be rectified for proper operation of the circuits. Figure 3.1: Setup margin distribution for Reference X axis represents setup margin and y axis represents number of paths. Similarly setup margin distribution for the final design known as “Current” design is given in Figure 3.2. Figure 3.3: Hold margin distribution for Reference design Figure 3.3 shown bar chart distribution for Reference design. Figure 3.4 shows bar chart distribution for Current design. Figure 3.2: Setup margin distribution for Current design Comparison of number of paths in each margin bucket between Reference and Current design is given in Table 3.1.From the table it is clear that the Reference design has 373 paths with negative margin. In current design number of paths with negative margin is 9.These 9 paths are external paths. That means, these paths are generated from other modules. It is the responsibility of driver modules to fix these paths as decided from top section level. Figure 3.4: Hold margin distribution for Current design 2885 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on November 17,2023 at 18:39:15 UTC from IEEE Xplore. Restrictions apply. IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017) Table 3.2: Comparison of “Reference” and “Current” design for hold margin Number of Margin(Pico paths in Number of paths in second) Reference Current design [-50 < X < -25 ] 1986 0 [-25 < X < 0 ] 3117 0 [0 < X < 25 ] 2017 6027 [25 < X <50 ] 1013 4288 as cell resizing, clock tuning, placement optimization and routing optimization. These techniques can be applied for all digital circuits for timing closure. REFERENCES Comparison of number of paths for hold margin bucket between Reference and Current design is shown in Table 3.2.From the table it is clear that Reference design has 5103 paths with negative margin. In Current design all the violations are cleared. Number of paths with negative margin is 0. IV CONCLUSION A module was designed and timing convergence methods were applied on the module. The design was done by applying low power techniques [1][2][3] and significant saving in power was obtained. The module was converged for timing by applying techniques as explained in the paper such [1] Mayank Chakraverty, Harisankar PS and Vaibhav Ruparelia,” Low Power Design Practices for Power Optimization at the Logic and Architecture Levels for VLSI System Design”, IEEE conference publications,International conference on energy efficient technologies for sustainability,2016. [2] Gary K. Yeap, “Logic” in Practical Low Power Digital VLSI Design,1sted, Kluwer Academic Publishers Norwell, MA, USA ©1998 ISBN:978-0-7923-8009-2. [3] J.T. Burd and R. Brodersen, “Processor Design for Portable Systems”,Journal of VLSI Signal Processing Systems, vol. 13, no. 2–3, pp. 203–221, August 1996. [4] J. Montanaro, et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor”, IEEE Journal of Solid-State Circuits, vol. 31, no. 11,pp. 1703–1714, November 1996. [5] M. Takahashi, et al., “A 60-mW MPEG4 Video Coded Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme”, IEEE Journal of Solid-State Circuits, vol. 33, no. 11, pp. 1772– 1780, November 1998. [6] Phani kumar M, N. Shanmukha Rao, “A Low Power and High Speed Design for VLSI Logic Circuits Using Multi-Threshold Voltage CMOS Technology”, International Journal of Computer Science and Information Technologies (IJCSIT), Vol. 3 (3) , PP. 4131-4133,ISSN: 0975-9646, 2012. [7] Ko-Chi Kuo and Hsueh-Ta Ko, “Low Power Design Flow with Static and Statistical Timing Aanalysis”, 2012 IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2012) November 4-7, 2012 [8] Ali Dasdan and Ivan Hom, “Handling Inverted Temperature Dependence Static Timing Analysis,” ACM Transactions on Design Automation of Electronic Systems, vol. 11, no. 2, pp. 306-324, April 2006 2886 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on November 17,2023 at 18:39:15 UTC from IEEE Xplore. Restrictions apply.