10.5.1

advertisement
10 System Integration with reusable macro
Topic





1
Integration overview
Integrating soft macros
Integrating hard macros
Integrating RAMs and datapath generators
Physical design
10.1 Integration Overview
The Physical design can be broken into the
following steps:







2
Selecting IP blocks and preparing them for
integration
Integrating all the blocks into the top-level RTL
Planning the physical design
Synthesis and initial timing analysis
Initial physical design,and timing analysis, with
iteration until timing closure
Final physical design, timing verification, and
power analysis
Physical verification of the design
10.2.1 Problems in Integrating IP
IP have been obtained from an external source,either a third party
or some other division of the company,there are additional
problems that frequently occur





3
Someone on the team needs to become familiar enough with
the IP to integrate it into the design.
The documentation is incomplete, making this understanding
harder to obtain
The interface of the IP does not match the interface of the
system bus
The verification models,such as bus functional models,are
poorly written and difficult to integrate into the system
verification environment.
only limited support is available from the IP provider
10.2.2 Strategies for managing Interfacing Issues
There are several steps designs teams can
take, however,to mitigate the problem:
 Plan the interfaces.
 Keep all interfaces as simple as
possible,where we are designing IP or
custom blocks
 Standardize on a few common busses and
block-to block interfaces.
 Accumulate IP and experience with the IP.
 Document this expertise.
4
10.2.3 Interfacing Hard Macros to the Rest of the design
The consistent and compatible power,clock,
and test structure design for hard macro.
 Clock distribution:
The overall clock distribution for the chip must
accommodate the timing of the hard macro block

Power and ground:
Typically, the macro also has its own power and
ground rings within the macro

Test Structure
Designed hard macros have their own embedded
testability structure.
5
10.3 Selecting IP
▶ one key step is to select IP that can
be easily integrated into the overall
chip design.
▶ Choosing well-designed, welldocumented IP can greatly reduce
the integration effort.
6
10.3.1 Hard Macro Selection
▶ The first step is to determine the exact
requirements for macro.
ⓐ Quality of the documentation-
the basic
functionality,interface definitions,timing,how to integrated and
verify is clearly documanted
ⓑ Completeness of the design and verification
environment – functional, timing, synthesis,floorplanning
models must be provide.
ⓒ Robustness of the design –
the design must be
proven in silicon.
ⓓ Physical design limitations
– aspect ratio,blockage
and porosity of the macro must be considered.
7
10.3.2 Soft Macro Selection
▶ Once the requirements for the macro are fully
ⓐ
ⓑ
ⓒ
ⓓ
8
understood, the choices can quickly be narrowed to
those meet the functional, timing, area ,and power
requirements of the design.
Quality of the documentation
Completeness of the design and verification
environment
Robustness of the design
Ease of use – includes the ease of interfacing the
macro to the rest of the design as well as quality
and user-friendliness of the installation and
synthesis scripts.
10.3.3 Soft Macro Installation


9
The macro, its documentation, and its full
design verification environment should be
installed and integrated into your design
environment
In particular , all component of the macro
package should be under revision control
10.3.4 Soft Macro Configuration


10
Many soft macros are configurable through
parameter settings.
A key issue here is to make sure that the
combination of parameter settings is
consistent and correct.
10.3.5 Synthesis of Soft Macros


11
The final step in preparing the IP for
integration is to perform an initial synthesis
with the target technology library.
Initial synthesis => timing , area, and power
goals of the design
10.4 Integrating Memories

Memories are special case of the hard macro,
and are worth some additional comment.

The issues affecting memory design are identical to
those affecting hard macro design, with following
additonal issues.
•the integrator typically has a wide choice of RAM configurations,
such as single port or multi-port, fast or low-power,synchronous
or asynchronous
•asynchronous RAMs present a problem because generating a
write clock requires a very timing-critical design that is tricky to
create and difficult to verify. A fully synchronous RAM is strongly
preferred.
12
10.4 Integrating Memories
• Large RAMs with fixed aspect ratios can present
significant blockage problems. Check with your RAM
provider to see if the aspect ratio of the RAMs can be
modified if necessary.
• BIST is available for many RAM design, and can
greatly reduce test time and eliminates the need to
bring RAM I/O to chip’s pin. However, the integrator
Should be cautious because some BIST techniques do
not test for data retention problems.
13
10.5 Physical Design



14
The major challenge in the physical
implementation of large SoC design is
achieving timing close.( this process is
iterative )
To minimize the number of iterations in
physical design.
The process consist of four major activities.
• Preparation of the design
• placement loop
• timing closure
• physical verification
Figure 10-1 Integrating blocks into a chip
Design Planning
Synthesis
Detailed Floorplan
Initial Power Route
Preparation
Initial Placement
Quick Extraction
Timing Analysis
Refine Constraints
Placement
Loop
Re-optimize
ECO Place
15
Figure 10-1 Integrating blocks into a chip( cont. )
Clock Route
Detailed Route
3-D Extract, STA
Fix Timing, Clocks
3-D Extract, STA
Timing
Closure
Fix Hold Times
3-D Extract, STA
Check Power
Physical
Verification
16
DRC/LVS
10.5.1 Design Planning to Initial Placement
Preparation of the design involves design planning ,
synthesis, floorplan, and initial routing.
 Design planning
• Physical design starts with planning.
• The team should do an initial estimate of die size and

power dissipation. => key for determining package type
• Once the team has partioned the design into blocks,
the team can do floorplan.( initial floorplan의 정보를 가
지고 더 정확한 wire load model과 timing budgets for
synthesis를 추출)
•Once the team has RTL & GDSⅡ, the team can use an
RTL floorplanner such as Chip Architect to refine the
floorplan.
17
10.5.1
Blk B
Blk A
Blk B
(a)
Blk A
(b)
Figure 10-2 The floorplan affects timing budgets and wire load models
18
10.5.1
Synthesis
Using the wire load models and timing budgets from
the initial floorplan, we synthesize each block
independently.
 1.We do a top-level synthesis of the entire chip,
using timing models for the hard macro.
2.At the top level, synthesis should be required only
to stitch together the top level netlist and refine the
system-level interconnect.
3.the design planning tool to readjust block
placement, or I/O placement for key block, to reroute some top-level nets.
=> Generate a new timing budget and wire loads, and
re-run block-level sysnthesis.

19
10.5.1

The inputs to the top-level synthesis include
• Timing budgets and wire load models from the
design planning stage.
• RTL for the synthesizable blocks
• Synthesis model for the hard macro and memories
• Netlists for any modules generated from a datapath
generator
• Any libraries required, such as the design ware
foundation library
• Top-level RTL that instantiates the blocks, the
I/O pads, and top-level test structures.
=> The synthesis models required to complete synthesis
on the whole chip and to verify timing at the chip
level
20
10.5.1
 The top-level test structure
1. a JTAG TAP controller
2. a custom controller for scan and BIST structuers

Top-level netlist generated, scan cell be inserted.
ex) ATPG( Automatic Test Pattern Generator)



21
if clock gating is need to reduce power,then the
power complier should be used to convert mux-hold
flops into gated clocks.
The final netlist, along with timing information, is now
ready for detaied floorplaning.
The designer avoid false and multicycle paths
completely. Worst case, the list of paths should be
very short.( 설명은 218쪽 마지막 단락)
Figure 10-3 chip-level synthesis
RTL
DesignWare
libraries
Budget,
Wire load
Block-level Synthesis
Budget,
Wire load
Memory
compiler code
Datapath
source code
Hard macro
Memory compiler
Datapath compiler
Synthesis
model
Synthesis
model
Netlist
22 Top-level Synthesis using Design Compiler/Test Compiler/Power Compiler
Figure 10-3 chip-level synthesis(계속)
Netlist
I/O pad assignment
Bonding Diagram
GDSII from
memory compiler
Detailed Floorplan
New budgets, wire
loads for synthesis
To block-level,
top-level synthesis
To Packager
23
GDSII for hard
macro
Figure 10-3 Chip-level synthesis
10.5.1

Detailed Floorplan and initial power route
At this point, we can read the final netlist into
floorplanning tool and complete the preparation
for placement.
ⅰ. Block placement
ⅱ. I/O pad placement
ⅲ. Placement of the I/O cells for each block ( we can fix. )



24
We do an initial route of the power mesh, the
distribution of power and ground in the chip.
Typically power routing involves placing wide power
and ground rings around the periphery of the chip
Chips wiil have different power supplies for I/O and
the core logic, especially if they run off of different
voltage.
10.5.1

Initial Placement
Timing driven placement has been a goal of
tool providers and engineers for many years.
- today the technology is mature enough to make timing
closure on large chip

To use this technology effectively
•
•
•
•


25
A good technology file that describe the
parameters of the silicon technology.
Accurate timing constraints
An optimization-friendly design
Timing driven placement takes input : the
timing constraints and gate-level netlist
Routing delay estimate is key in achieving
timing closure.
10.5.1



26
Initial Placement
The placement tool relies on a technology file to tell it
how to estimate the capcitance of metal interconnect.
In deep submicron design, wire are taller than they
are wide: fringe capcitance has significant effect on
overall capacitance.=> accurate capacitance
estimates.
Once the capacitance per unit length is well modeled,
the placement engine must estimate the actual length
of each connection.=> the good placement tools are
able to estimate congestion and its effect on routing
resources, and factor this into the delay estimate
10.5.1
27
Initial Placement

Clearly , accurate timing constraints are essential to
good timing-driven placement.( avoiding false and
multicycle paths can greatly help achieve rapid timing
closure )

In particular, a fully synchronous, flop-based
design can allow timing driven placement to
produce excellent result.
10.5.1

Flat vs. Hierarchical Placement
※ one critical issue in doing placement in deciding
how much hierarchy to maintain during physical design.
ⅰ. Some designs are designed with strict hierarchy.
ⅱ. Another approach is to maintain hierarchy.
ⅲ. A third approach is to do a completely flat place and route.
( 각 approach의 내용은 교과서 참조 )


28
Real design may use a combination of above
approach.
The only strong recommendation we make in this
area is that the physical hierarchy should reflect
the logical hierarchy.
10.5.2 Placement Loop

Two major source of timing problem
=> the timing constraints, the design itself
If the design has false paths that are not listed in the
constraints
 If the timing is close but not quite passing, then it
may be useful to refine the timing budgets.
 If timing is still not meet, then we may have to
modify the design itself, changing the RTL to add
pipeline stages or the like.
▶ any above scenarios, it becomes necessary to iterate
through placement. The goal is to make this iteration
as short as possible, so that we can converge
quickly to a placement that meets timing.

29
10.5.2
The actual loop through placement, analysis,
and re-optimization is describe below
ⅰ. Quick Extraction: after placement is
complete, the placement tool generates a
report of the estimated capacitances in the
routing.
ⅱ. Timing analysis: a timing analysis tools can
read these capacitance, along with the
constraints and timing constraints,and output
a timing report.
ⅲ. Refine constraints: if violations are false
paths, we update the timing constraints.

30
10.5.2
ⅳ. Re-optimize:
• if the timing violations are real,the most of them will
probably be from excessive capacitance loading gate
outputs.(increase drive strength, add buffers,or even
restructure logic)
• The best timing-driven placement tools have the
capability of doing much of this automatically as part of
timing-driven placement.
• If available placement tools do not re-optimize,or if
significant restructuring of logic required, then we have
to use the in-place optimization capabilities of
synthesis tools.
31
10.5.2
ⅴ. ECO Place
• The ECO Placement capabilities of the
placement tool allows us to give it a revised
netlist and physical locations for the new device
• the goal of ECO placement is to maintain as
much of the existing placement as possible.
32
10.5.3 Timing Closure

After placement meets timing, we have
several key tasks to complete the design
• Clock Route – route clock, known as clock tree
synthesis.( the most critical nets,need to be
balanced to minize clock skew ) typically require a
very large number of buffers -> optimizing the clock
for a specific placement of flip-flop
• Detailed Route – a complete physical design. More
accurate assessment of thiming and power
• Extraction and Timing Analysis – now use a full 3-D
exraction engine to calculate the actual capacitacne
of each segment of metal interconnect. => with this
data, we can now do a full static timing analysis and
determine the timing of the design.
33
10.5.3 Timing Closure
• Fixing Timing and Clocks – fix the clock and long
fath. ( large number of pixes -> go back and
readjust our timing constraints, re-optimize and go
back through place and route )
• Fixing Hold Time Violations – hold time problems
result from a combination of fast data paths from
register to register and clock skew. -> fixed during
the placement loop.
• Final Extraction and Timing Analysis – do one
final extraction an timing analysis.( review of final
timing report,verifying that our false and multicycle
paths specified earlier are really false )
34
10.5.4 Verifying the Physical Design

The physical design process is verifying that
the physical design is correct an in
compliance with the design rules for the
target process
▶ checking power: a check of the power distribution
system. Estimate the voltage drop across the power
meshes. Also get a final estimation of the power
dissipation of the design.
▶ DRC( Design Rule Check ) : verifies that the design
does not violate any physical design rules.
LVS( Layout vs Schematic ) : compare the design as
physically implemented to the gate-level netlist.
35
10.5.5 Summary



36
The physical design of very large chips :
challenging and complex task
spend many month trying to reach timing
closure
The designer can do reduce the risk of
runaway schedules in physical design. The
key is to making timing closure and physical
design a series of local, relatively small
problems.( once placement succeess, the
rest of design process straightforward)
The most important key to rapid timing
closure is the quality of the design itself.
Download