Design Migration Considerations

advertisement
0805EF3_LSILogic.doc
FINAL VERSION
Keywords: Structured ASIC, Design-for-Portability, Design, Migration
Editorial Features header: Programmable – Platform ASIC
@head:Good Engineering Practices Minimize Design-Porting Effort
@deck:To greatly improve the success of future design migrations, follow a design-forportability methodology that’s implemented at the beginning of the product design cycle.
@text:Design migration is an increasingly important issue facing today’s design teams.
Typically, design migrations may involve taking a design from a hardware prototype to a
production cell-based, application-specific integrated circuit (ASIC). Or they could entail
performing a cost-reduction technology migration. Several factors drive design
migration. For example, an increase in design complexity results in longer simulationbased verification. This increase in verification time has reduced the practicality for
simulation-only verification. At the same time, it has raised the demand for hardware
prototyping. Another factor that drives design migration is the increasing mask costs that
make it harder for companies to enter new markets.
Instead of going straight to a cell-based technology, companies are increasingly opting
for lower-cost platform ASICs. Structured ASICs only provide a logic and memory array.
In contrast, platform ASICs include logic, memory, and a large range of IP (e.g., ARM,
SERDES, and DDR). Platform ASICs help reduce development cost and risk while
allowing companies to get to market faster. Once a product is successful in the market
and high production quantities are required, the design can be migrated to a costoptimized, cell-based ASIC.
This migration approach works if designs are architected properly from the start. If not,
those same designs can quickly become locked into a single-implementation technology
or vendor. If an unplanned migration becomes necessary, significant technical challenges
could arise later in the design cycle.
Design-for-reuse (DFR) concepts are well documented in many publications, such as the
Reuse Methodology Manual for System-on-a-Chip Designs [1]. Typically, such
publications focus on a single subsystem in a design and how it is reused in subsequent
designs. Designs that will later be migrated to a standard-cell architecture require more
than just good design-reuse practices. They need to follow the concepts of a Design for
Portability approach.
Designing for Portability follows many of the same practices as DFR. Yet there are some
important additional considerations. These considerations emphasize both the top-level
structural design issues as well as the direct register-transfer-level (RTL) instantiation
issues faced during design migration.
Each technology platform is usually different enough so that it isn’t possible to have a
single design that will map to all platforms without modification. With proper
consideration and understanding of the issues, however, it’s possible to structure and
implement a design that will minimize the migration effort.
For some designs, a compromise still has to be made. The primary target platform should
dictate the implementation details. That platform will be used when the device enters full
volume production.
Migrating The Top Level
The top level of a design will almost always have to change during a migration. By
following good design practices, this effort can be minimized. The most important
structural rule is to separate the “functional top level” of the design from the “device top
level.” The functional top contains all subsystem instantiation and connectivity. The
device top contains only the I/Os, clock module, and instantiation of the functional top.
There shouldn’t be any user logic in the device top. If individual I/Os need to be threestate or bidirectional, they should be made so within the device top. In this case, the
functional top should contain separate ‘input,’ ‘output,’ and ‘direction’ signals (or
‘output’ and ‘enable’ for three-state). Thus, no three-state or bidirectional logic should
exist within the functional top. Separating the functional top from the device top
significantly reduces the complexity of replacing I/Os and clocking during a migration.
Perhaps one of the most important issues affecting design portability is clocking. ASICs
and high-end platform ASICs have little or no limitation on the number of clocks in a
design. In contrast, FPGAs and some structured ASICs have a fixed, limited clock
structure. Limitations on clock resources can cause significant and sometimes unsolvable
challenges for design migrations. Clocking should therefore be considered very early in
the design process.
As a workaround to address the limited number of clocks, FPGA designs sometimes
contain distributed clock enables. These enables generate lower-speed clocks further
down the clock tree. This approach should be avoided where possible, as it can cause
problems in other implementation technologies.
To maximize design portability, all clock-related logic should be placed in a single
module. If such placement isn’t possible, clocks should still be sourced from outside the
functional top level described earlier. Different implementation technologies have
different clock requirements. Separating the clock-generation and conditioning circuits
from the functional logic will significantly simplify the task of migrating the clocking
circuitry.
When the clock module cannot be contained outside the functional top, it should be
directly instantiated by that top. Otherwise, more RTL may have to be modified during a
migration. The reason is the depth in the hierarchy at which clock generation occurs.
Burying clock-generation circuitry also makes a design harder to understand, which may
lead to human errors during migration.
Another way to maximize design portability is by keeping logic as generic as possible.
Using generic logic is necessary because clocking requirements vary among
implementation technologies. For example, flip-flops should use only a single clock edge
within each clock domain.
To help later migrations, don’t use clocks from unregistered combinatorial logic.
Furthermore, clock buffers should never be instantiated in the register transfer level.
Gated clocks are supported by some platform ASICs but generally not by fieldprogrammable gate arrays (FPGAs). It is therefore best to avoid them.
Resets are another area that can impact the portability of a design if it isn’t suitably
implemented. The reset generation and conditioning should be performed in the clock
module or in a separate module that’s still outside of the functional top. A reset scheme
that takes advantage of the best of both the asynchronous and synchronous resets is
preferable. Having an asynchronous assertion of reset and a synchronized (to the relevant
clock) de-assertion can avoid most reset-related problems. For more information, see
“Asynchronous and Synchronous Reset Design Techniques” by Cliff Cummings [2].
Muxes, I/Os, Memories, And More
Devices like muxes, which send one of several inputs out over a single output channel,
can be slow and congested when implemented in FPGAs. To avoid these congestion
problems, three-state signals are sometimes used. But internal three-state signals can
cause significant problems when a design is migrated across different implementation
technologies. It’s therefore best to avoid using internal three-state signals. Muxes are the
devices of choice. If necessary, large central muxes should be broken up into several
smaller, localized muxes. This technique will help reduce congestion.
I/O selection and instantiation also affect portability. When specialized I/Os like DDR are
required, a designer must determine that each target implementation technology supports
all required I/O types. Often, significant time must be spent ensuring that all I/O types
can map to a particular implementation technology. I/Os should only be instantiated in
the device top level. If an I/O is buried in a design hierarchy, it can cause many RTL files
to require modification for a design migration. It also makes a design harder to
understand. In addition, some implementation technologies have a hard requirement that
all I/Os be instantiated at the top level.
Often, FPGAs don’t require that I/Os be instantiated manually. In some cases, a tool can
automatically build the I/O wrapper based upon design data. In those scenarios, special
care must still be taken with all signals. Individual input, output, and direction/enable
pins should always be brought to the functional top. Otherwise, not all signals will be
available at the functional top level if the design is migrated to a technology in which the
I/Os must be specifically instantiated. If this scenario occurs, all of the individual signals
(i.e., in, out, direction) will need to be brought out to the functional top level. This task
requires a significant amount of effort (see Figure 1).
In addition to I/Os, memories can be designed for migration. The impact of porting
memories can be minimized if care is taken early in the design process. It’s good practice
to separate the logical memory instantiation from the physical implementation. The
physical implementation can then be changed without the RTL or logical instantiation
having to change. If wrappers are used to achieve this separation, only the wrappers will
need to be updated during a migration (see Figure 2).
A memory wrapper is a file with a generic name and/or module name (i.e.,
mem_1r1w_256x32.v/mem_1r1w_256x32). The user should instantiate that name in his
or her RTL. This file then instantiates the real physical memory (i.e.,
FPGA_MEM_NAME or ASIC_MEM_NAME). Different copies of the wrapper file
(mem_1r1w_256x32.v) can be used to instantiate different vendors’ memory instances.
As a result, the original RTL doesn’t need to be changed in order to change the memory.
Only the memory wrapper needs to be altered. It's always better to not change the
functional RTL files for non-functional reasons, such as a technology port in which
memory names change but functionality remains the same. The wrapper also can be used
to invert signals in which different underlying memories have different active low/high
signals. Another way to separate the logical memory instantiation from the physical
implementation is through the use of compile switches.
Let’s examine other functional-level considerations. Logic should be designed
synchronously, for example, with a single rising-edge clock driving all flip-flops. This
approach avoids many tool-related issues that can otherwise occur during
implementation.
DDR and other blocks that require both edges of a clock to be used are obvious
exceptions to this goal of a single rising-edge clock implementation. DDR logic must
always be carefully designed regardless of the target platform technology. In many cases,
a 180-degree, phase-shifted rising edge can be used as a substitute for a falling-edge
clock. To ensure the portability of synchronous logic, it’s essential to guarantee that the
clock structure is clearly defined and meets the criteria for the primary production
technology. Using the reset scheme described above also will help to improve portability.
Synchronizers should be put on any asynchronous incoming logic. They also should be
used on internal signals that change clock domains. If it is placed in a dedicated module,
synchronization logic is usually easier to debug. Latches should be avoided at all costs.
Ensure that all “if” statements have an “else” and that “case” statements have a “default”
case. Otherwise, latches are inferred. Latches can cause problems with the back-end
implementation of some implementation technologies. Combinatorial feedback loops
should always be avoided.
Technology-Specific Optimizations
To increase performance, designs sometimes contain technology-specific optimizations.
Such optimizations are usually in FPGA designs. But code that’s optimized for one
technology is often not optimal for another.
The direct instantiation of primitives can significantly reduce a design’s portability.
Technology-specific primitives that are instantiated in RTL are one of two types: diffused
or meta-primitives. Usually, diffused primitives are low-level dedicated blocks of
diffused IP. Examples of diffused primitives include the dedicated multipliers that are
available in many FPGA families. Meta-primitives are low-level blocks, such as FIFOs,
that are compiled by dedicated tools into an implementation that’s efficient for a
particular architecture.
In each case, the primitive is directly instantiated in the RTL. Thus, using these
primitives is a good way to improve the portability of a design. Typically, synthesis tools
are very good at optimizing technology-independent RTL for the technology at which
they’re targeted. This statement is especially true for the finer-grained technologies that
are found with high-end platform and cell-based ASICs.
If possible, avoid technology-specific optimizations. If these optimizations are included,
they should be instantiated inside a wrapper. If the design is ported, the optimizations can
then be replaced with equivalent, technology-independent RTL. To guarantee
equivalence, verification or formal verification can be performed between the
technology-specific and technology-independent versions.
Sometimes, it’s difficult to compartmentalize such optimizations--especially where buses
have been widened or pipelined. Compile switches, such as the Verilog ‘ifdef,’ can be
used to implement both optimized and non-optimized versions of code in a single RTL
file. To ensure equivalence, functional verification and possibly formal verification
should be performed for both implementations.
Intellectual Property
A growing number of complex chip designs contain intellectual property (IP). If this IP is
RTL-based, it is more easily ported across all implementation technologies. Non-RTLbased IP is usually less portable. In addition to high-level IP (typically entire
subsystems), some vendors offer low-level primitives like the diffused primitives and
meta-primitives mentioned earlier. Both types cause problems for design portability.
Typically, meta-primitives provide an efficient implementation for a particular
technology. In each case, the primitive is directly instantiated in the RTL. The RTL
instantiation of low-level primitives should be avoided if possible. If such primitives are
used, they should be made switchable by the use of compile switches or wrappers. If they
are directly instantiated in a design, significant effort will be required to migrate that
design.
When porting high-level IP, the design must use special care. For example, non RTLbased IP typically isn’t portable between implementation technologies. Usually,
replacement IP is needed that has been specifically designed for the new implementation
technology. Consider the challenges in porting a DDR interface that has implementationtechnology-specific I/Os. These I/Os are usually different among vendors. Frequently, the
I/Os are tightly woven into the IP, which makes it difficult to port even if most of the IP
is RTL. In some cases, it can be easier to replace the entire subsystem than to migrate the
high-level IP.
If a migration activity is known at the start of a design cycle, a wrapper or compile switch
in the RTL can be used to select between two IP instantiations. This approach allows the
design to be easily migrated at a later point.
Design migration is an increasingly important issue for design teams to consider. As
design complexity and cost pressures increase, the need for easy design migration is
growing. By adopting a technology-independent mindset during initial implementation,
design teams will save time and reduce risk when they later perform a migration. By
paying close attention to hierarchy, clocking, IP, I/Os, and three-state signals, designers
can avoid being locked into a single implementation technology. Embracing portable
design practices can yield competitive advantages. It also can help to ensure future
product success by simplifying a cost-reduction path.
Greg Martin is a Senior Product Applications Engineer for LSI Logic's RapidChip
Technology Marketing Division. For the past eight years, he has worked in a variety of
engineering and marketing positions at LSI Logic. He received a MENG in
Microelectronic Systems Engineering from UMIST, Manchester, U.K.
REFERENCES:
[1] Reuse Methodology Manual for System-On-A-Chip Designs by Michael Keating,
Pierre Bricaud.
[2] www.sunburst-design.com/papers/CummingsSNUG2002SJ_Resets.pdf by Cliff
Cummings
+++++++++++++
Captions:
Figure 1: This illustration shows how I/O signals can be brought out to the functional top,
as separated from the device top level of a design.
Figure 2: Memory wrappers allow physical mapping to be changed without RTL
modification.
Download