10 System Integration with reusable macro Topic 1 Integration overview Integrating soft macros Integrating hard macros Integrating RAMs and datapath generators Physical design 10.1 Integration Overview The Physical design can be broken into the following steps: 2 Selecting IP blocks and preparing them for integration Integrating all the blocks into the top-level RTL Planning the physical design Synthesis and initial timing analysis Initial physical design,and timing analysis, with iteration until timing closure Final physical design, timing verification, and power analysis Physical verification of the design 10.2.1 Problems in Integrating IP IP have been obtained from an external source,either a third party or some other division of the company,there are additional problems that frequently occur 3 Someone on the team needs to become familiar enough with the IP to integrate it into the design. The documentation is incomplete, making this understanding harder to obtain The interface of the IP does not match the interface of the system bus The verification models,such as bus functional models,are poorly written and difficult to integrate into the system verification environment. only limited support is available from the IP provider 10.2.2 Strategies for managing Interfacing Issues There are several steps designs teams can take, however,to mitigate the problem: Plan the interfaces. Keep all interfaces as simple as possible,where we are designing IP or custom blocks Standardize on a few common busses and block-to block interfaces. Accumulate IP and experience with the IP. Document this expertise. 4 10.2.3 Interfacing Hard Macros to the Rest of the design The consistent and compatible power,clock, and test structure design for hard macro. Clock distribution: The overall clock distribution for the chip must accommodate the timing of the hard macro block Power and ground: Typically, the macro also has its own power and ground rings within the macro Test Structure Designed hard macros have their own embedded testability structure. 5 10.3 Selecting IP ▶ one key step is to select IP that can be easily integrated into the overall chip design. ▶ Choosing well-designed, welldocumented IP can greatly reduce the integration effort. 6 10.3.1 Hard Macro Selection ▶ The first step is to determine the exact requirements for macro. ⓐ Quality of the documentation- the basic functionality,interface definitions,timing,how to integrated and verify is clearly documanted ⓑ Completeness of the design and verification environment – functional, timing, synthesis,floorplanning models must be provide. ⓒ Robustness of the design – the design must be proven in silicon. ⓓ Physical design limitations – aspect ratio,blockage and porosity of the macro must be considered. 7 10.3.2 Soft Macro Selection ▶ Once the requirements for the macro are fully ⓐ ⓑ ⓒ ⓓ 8 understood, the choices can quickly be narrowed to those meet the functional, timing, area ,and power requirements of the design. Quality of the documentation Completeness of the design and verification environment Robustness of the design Ease of use – includes the ease of interfacing the macro to the rest of the design as well as quality and user-friendliness of the installation and synthesis scripts. 10.3.3 Soft Macro Installation 9 The macro, its documentation, and its full design verification environment should be installed and integrated into your design environment In particular , all component of the macro package should be under revision control 10.3.4 Soft Macro Configuration 10 Many soft macros are configurable through parameter settings. A key issue here is to make sure that the combination of parameter settings is consistent and correct. 10.3.5 Synthesis of Soft Macros 11 The final step in preparing the IP for integration is to perform an initial synthesis with the target technology library. Initial synthesis => timing , area, and power goals of the design 10.4 Integrating Memories Memories are special case of the hard macro, and are worth some additional comment. The issues affecting memory design are identical to those affecting hard macro design, with following additonal issues. •the integrator typically has a wide choice of RAM configurations, such as single port or multi-port, fast or low-power,synchronous or asynchronous •asynchronous RAMs present a problem because generating a write clock requires a very timing-critical design that is tricky to create and difficult to verify. A fully synchronous RAM is strongly preferred. 12 10.4 Integrating Memories • Large RAMs with fixed aspect ratios can present significant blockage problems. Check with your RAM provider to see if the aspect ratio of the RAMs can be modified if necessary. • BIST is available for many RAM design, and can greatly reduce test time and eliminates the need to bring RAM I/O to chip’s pin. However, the integrator Should be cautious because some BIST techniques do not test for data retention problems. 13 10.5 Physical Design 14 The major challenge in the physical implementation of large SoC design is achieving timing close.( this process is iterative ) To minimize the number of iterations in physical design. The process consist of four major activities. • Preparation of the design • placement loop • timing closure • physical verification Figure 10-1 Integrating blocks into a chip Design Planning Synthesis Detailed Floorplan Initial Power Route Preparation Initial Placement Quick Extraction Timing Analysis Refine Constraints Placement Loop Re-optimize ECO Place 15 Figure 10-1 Integrating blocks into a chip( cont. ) Clock Route Detailed Route 3-D Extract, STA Fix Timing, Clocks 3-D Extract, STA Timing Closure Fix Hold Times 3-D Extract, STA Check Power Physical Verification 16 DRC/LVS 10.5.1 Design Planning to Initial Placement Preparation of the design involves design planning , synthesis, floorplan, and initial routing. Design planning • Physical design starts with planning. • The team should do an initial estimate of die size and power dissipation. => key for determining package type • Once the team has partioned the design into blocks, the team can do floorplan.( initial floorplan의 정보를 가 지고 더 정확한 wire load model과 timing budgets for synthesis를 추출) •Once the team has RTL & GDSⅡ, the team can use an RTL floorplanner such as Chip Architect to refine the floorplan. 17 10.5.1 Blk B Blk A Blk B (a) Blk A (b) Figure 10-2 The floorplan affects timing budgets and wire load models 18 10.5.1 Synthesis Using the wire load models and timing budgets from the initial floorplan, we synthesize each block independently. 1.We do a top-level synthesis of the entire chip, using timing models for the hard macro. 2.At the top level, synthesis should be required only to stitch together the top level netlist and refine the system-level interconnect. 3.the design planning tool to readjust block placement, or I/O placement for key block, to reroute some top-level nets. => Generate a new timing budget and wire loads, and re-run block-level sysnthesis. 19 10.5.1 The inputs to the top-level synthesis include • Timing budgets and wire load models from the design planning stage. • RTL for the synthesizable blocks • Synthesis model for the hard macro and memories • Netlists for any modules generated from a datapath generator • Any libraries required, such as the design ware foundation library • Top-level RTL that instantiates the blocks, the I/O pads, and top-level test structures. => The synthesis models required to complete synthesis on the whole chip and to verify timing at the chip level 20 10.5.1 The top-level test structure 1. a JTAG TAP controller 2. a custom controller for scan and BIST structuers Top-level netlist generated, scan cell be inserted. ex) ATPG( Automatic Test Pattern Generator) 21 if clock gating is need to reduce power,then the power complier should be used to convert mux-hold flops into gated clocks. The final netlist, along with timing information, is now ready for detaied floorplaning. The designer avoid false and multicycle paths completely. Worst case, the list of paths should be very short.( 설명은 218쪽 마지막 단락) Figure 10-3 chip-level synthesis RTL DesignWare libraries Budget, Wire load Block-level Synthesis Budget, Wire load Memory compiler code Datapath source code Hard macro Memory compiler Datapath compiler Synthesis model Synthesis model Netlist 22 Top-level Synthesis using Design Compiler/Test Compiler/Power Compiler Figure 10-3 chip-level synthesis(계속) Netlist I/O pad assignment Bonding Diagram GDSII from memory compiler Detailed Floorplan New budgets, wire loads for synthesis To block-level, top-level synthesis To Packager 23 GDSII for hard macro Figure 10-3 Chip-level synthesis 10.5.1 Detailed Floorplan and initial power route At this point, we can read the final netlist into floorplanning tool and complete the preparation for placement. ⅰ. Block placement ⅱ. I/O pad placement ⅲ. Placement of the I/O cells for each block ( we can fix. ) 24 We do an initial route of the power mesh, the distribution of power and ground in the chip. Typically power routing involves placing wide power and ground rings around the periphery of the chip Chips wiil have different power supplies for I/O and the core logic, especially if they run off of different voltage. 10.5.1 Initial Placement Timing driven placement has been a goal of tool providers and engineers for many years. - today the technology is mature enough to make timing closure on large chip To use this technology effectively • • • • 25 A good technology file that describe the parameters of the silicon technology. Accurate timing constraints An optimization-friendly design Timing driven placement takes input : the timing constraints and gate-level netlist Routing delay estimate is key in achieving timing closure. 10.5.1 26 Initial Placement The placement tool relies on a technology file to tell it how to estimate the capcitance of metal interconnect. In deep submicron design, wire are taller than they are wide: fringe capcitance has significant effect on overall capacitance.=> accurate capacitance estimates. Once the capacitance per unit length is well modeled, the placement engine must estimate the actual length of each connection.=> the good placement tools are able to estimate congestion and its effect on routing resources, and factor this into the delay estimate 10.5.1 27 Initial Placement Clearly , accurate timing constraints are essential to good timing-driven placement.( avoiding false and multicycle paths can greatly help achieve rapid timing closure ) In particular, a fully synchronous, flop-based design can allow timing driven placement to produce excellent result. 10.5.1 Flat vs. Hierarchical Placement ※ one critical issue in doing placement in deciding how much hierarchy to maintain during physical design. ⅰ. Some designs are designed with strict hierarchy. ⅱ. Another approach is to maintain hierarchy. ⅲ. A third approach is to do a completely flat place and route. ( 각 approach의 내용은 교과서 참조 ) 28 Real design may use a combination of above approach. The only strong recommendation we make in this area is that the physical hierarchy should reflect the logical hierarchy. 10.5.2 Placement Loop Two major source of timing problem => the timing constraints, the design itself If the design has false paths that are not listed in the constraints If the timing is close but not quite passing, then it may be useful to refine the timing budgets. If timing is still not meet, then we may have to modify the design itself, changing the RTL to add pipeline stages or the like. ▶ any above scenarios, it becomes necessary to iterate through placement. The goal is to make this iteration as short as possible, so that we can converge quickly to a placement that meets timing. 29 10.5.2 The actual loop through placement, analysis, and re-optimization is describe below ⅰ. Quick Extraction: after placement is complete, the placement tool generates a report of the estimated capacitances in the routing. ⅱ. Timing analysis: a timing analysis tools can read these capacitance, along with the constraints and timing constraints,and output a timing report. ⅲ. Refine constraints: if violations are false paths, we update the timing constraints. 30 10.5.2 ⅳ. Re-optimize: • if the timing violations are real,the most of them will probably be from excessive capacitance loading gate outputs.(increase drive strength, add buffers,or even restructure logic) • The best timing-driven placement tools have the capability of doing much of this automatically as part of timing-driven placement. • If available placement tools do not re-optimize,or if significant restructuring of logic required, then we have to use the in-place optimization capabilities of synthesis tools. 31 10.5.2 ⅴ. ECO Place • The ECO Placement capabilities of the placement tool allows us to give it a revised netlist and physical locations for the new device • the goal of ECO placement is to maintain as much of the existing placement as possible. 32 10.5.3 Timing Closure After placement meets timing, we have several key tasks to complete the design • Clock Route – route clock, known as clock tree synthesis.( the most critical nets,need to be balanced to minize clock skew ) typically require a very large number of buffers -> optimizing the clock for a specific placement of flip-flop • Detailed Route – a complete physical design. More accurate assessment of thiming and power • Extraction and Timing Analysis – now use a full 3-D exraction engine to calculate the actual capacitacne of each segment of metal interconnect. => with this data, we can now do a full static timing analysis and determine the timing of the design. 33 10.5.3 Timing Closure • Fixing Timing and Clocks – fix the clock and long fath. ( large number of pixes -> go back and readjust our timing constraints, re-optimize and go back through place and route ) • Fixing Hold Time Violations – hold time problems result from a combination of fast data paths from register to register and clock skew. -> fixed during the placement loop. • Final Extraction and Timing Analysis – do one final extraction an timing analysis.( review of final timing report,verifying that our false and multicycle paths specified earlier are really false ) 34 10.5.4 Verifying the Physical Design The physical design process is verifying that the physical design is correct an in compliance with the design rules for the target process ▶ checking power: a check of the power distribution system. Estimate the voltage drop across the power meshes. Also get a final estimation of the power dissipation of the design. ▶ DRC( Design Rule Check ) : verifies that the design does not violate any physical design rules. LVS( Layout vs Schematic ) : compare the design as physically implemented to the gate-level netlist. 35 10.5.5 Summary 36 The physical design of very large chips : challenging and complex task spend many month trying to reach timing closure The designer can do reduce the risk of runaway schedules in physical design. The key is to making timing closure and physical design a series of local, relatively small problems.( once placement succeess, the rest of design process straightforward) The most important key to rapid timing closure is the quality of the design itself.