0805EF3_LSILogic.doc FINAL VERSION Keywords: Structured ASIC, Design-for-Portability, Design, Migration Editorial Features header: Programmable – Platform ASIC @head:Good Engineering Practices Minimize Design-Porting Effort @deck:To greatly improve the success of future design migrations, follow a design-forportability methodology that’s implemented at the beginning of the product design cycle. @text:Design migration is an increasingly important issue facing today’s design teams. Typically, design migrations may involve taking a design from a hardware prototype to a production cell-based, application-specific integrated circuit (ASIC). Or they could entail performing a cost-reduction technology migration. Several factors drive design migration. For example, an increase in design complexity results in longer simulationbased verification. This increase in verification time has reduced the practicality for simulation-only verification. At the same time, it has raised the demand for hardware prototyping. Another factor that drives design migration is the increasing mask costs that make it harder for companies to enter new markets. Instead of going straight to a cell-based technology, companies are increasingly opting for lower-cost platform ASICs. Structured ASICs only provide a logic and memory array. In contrast, platform ASICs include logic, memory, and a large range of IP (e.g., ARM, SERDES, and DDR). Platform ASICs help reduce development cost and risk while allowing companies to get to market faster. Once a product is successful in the market and high production quantities are required, the design can be migrated to a costoptimized, cell-based ASIC. This migration approach works if designs are architected properly from the start. If not, those same designs can quickly become locked into a single-implementation technology or vendor. If an unplanned migration becomes necessary, significant technical challenges could arise later in the design cycle. Design-for-reuse (DFR) concepts are well documented in many publications, such as the Reuse Methodology Manual for System-on-a-Chip Designs [1]. Typically, such publications focus on a single subsystem in a design and how it is reused in subsequent designs. Designs that will later be migrated to a standard-cell architecture require more than just good design-reuse practices. They need to follow the concepts of a Design for Portability approach. Designing for Portability follows many of the same practices as DFR. Yet there are some important additional considerations. These considerations emphasize both the top-level structural design issues as well as the direct register-transfer-level (RTL) instantiation issues faced during design migration. Each technology platform is usually different enough so that it isn’t possible to have a single design that will map to all platforms without modification. With proper consideration and understanding of the issues, however, it’s possible to structure and implement a design that will minimize the migration effort. For some designs, a compromise still has to be made. The primary target platform should dictate the implementation details. That platform will be used when the device enters full volume production. Migrating The Top Level The top level of a design will almost always have to change during a migration. By following good design practices, this effort can be minimized. The most important structural rule is to separate the “functional top level” of the design from the “device top level.” The functional top contains all subsystem instantiation and connectivity. The device top contains only the I/Os, clock module, and instantiation of the functional top. There shouldn’t be any user logic in the device top. If individual I/Os need to be threestate or bidirectional, they should be made so within the device top. In this case, the functional top should contain separate ‘input,’ ‘output,’ and ‘direction’ signals (or ‘output’ and ‘enable’ for three-state). Thus, no three-state or bidirectional logic should exist within the functional top. Separating the functional top from the device top significantly reduces the complexity of replacing I/Os and clocking during a migration. Perhaps one of the most important issues affecting design portability is clocking. ASICs and high-end platform ASICs have little or no limitation on the number of clocks in a design. In contrast, FPGAs and some structured ASICs have a fixed, limited clock structure. Limitations on clock resources can cause significant and sometimes unsolvable challenges for design migrations. Clocking should therefore be considered very early in the design process. As a workaround to address the limited number of clocks, FPGA designs sometimes contain distributed clock enables. These enables generate lower-speed clocks further down the clock tree. This approach should be avoided where possible, as it can cause problems in other implementation technologies. To maximize design portability, all clock-related logic should be placed in a single module. If such placement isn’t possible, clocks should still be sourced from outside the functional top level described earlier. Different implementation technologies have different clock requirements. Separating the clock-generation and conditioning circuits from the functional logic will significantly simplify the task of migrating the clocking circuitry. When the clock module cannot be contained outside the functional top, it should be directly instantiated by that top. Otherwise, more RTL may have to be modified during a migration. The reason is the depth in the hierarchy at which clock generation occurs. Burying clock-generation circuitry also makes a design harder to understand, which may lead to human errors during migration. Another way to maximize design portability is by keeping logic as generic as possible. Using generic logic is necessary because clocking requirements vary among implementation technologies. For example, flip-flops should use only a single clock edge within each clock domain. To help later migrations, don’t use clocks from unregistered combinatorial logic. Furthermore, clock buffers should never be instantiated in the register transfer level. Gated clocks are supported by some platform ASICs but generally not by fieldprogrammable gate arrays (FPGAs). It is therefore best to avoid them. Resets are another area that can impact the portability of a design if it isn’t suitably implemented. The reset generation and conditioning should be performed in the clock module or in a separate module that’s still outside of the functional top. A reset scheme that takes advantage of the best of both the asynchronous and synchronous resets is preferable. Having an asynchronous assertion of reset and a synchronized (to the relevant clock) de-assertion can avoid most reset-related problems. For more information, see “Asynchronous and Synchronous Reset Design Techniques” by Cliff Cummings [2]. Muxes, I/Os, Memories, And More Devices like muxes, which send one of several inputs out over a single output channel, can be slow and congested when implemented in FPGAs. To avoid these congestion problems, three-state signals are sometimes used. But internal three-state signals can cause significant problems when a design is migrated across different implementation technologies. It’s therefore best to avoid using internal three-state signals. Muxes are the devices of choice. If necessary, large central muxes should be broken up into several smaller, localized muxes. This technique will help reduce congestion. I/O selection and instantiation also affect portability. When specialized I/Os like DDR are required, a designer must determine that each target implementation technology supports all required I/O types. Often, significant time must be spent ensuring that all I/O types can map to a particular implementation technology. I/Os should only be instantiated in the device top level. If an I/O is buried in a design hierarchy, it can cause many RTL files to require modification for a design migration. It also makes a design harder to understand. In addition, some implementation technologies have a hard requirement that all I/Os be instantiated at the top level. Often, FPGAs don’t require that I/Os be instantiated manually. In some cases, a tool can automatically build the I/O wrapper based upon design data. In those scenarios, special care must still be taken with all signals. Individual input, output, and direction/enable pins should always be brought to the functional top. Otherwise, not all signals will be available at the functional top level if the design is migrated to a technology in which the I/Os must be specifically instantiated. If this scenario occurs, all of the individual signals (i.e., in, out, direction) will need to be brought out to the functional top level. This task requires a significant amount of effort (see Figure 1). In addition to I/Os, memories can be designed for migration. The impact of porting memories can be minimized if care is taken early in the design process. It’s good practice to separate the logical memory instantiation from the physical implementation. The physical implementation can then be changed without the RTL or logical instantiation having to change. If wrappers are used to achieve this separation, only the wrappers will need to be updated during a migration (see Figure 2). A memory wrapper is a file with a generic name and/or module name (i.e., mem_1r1w_256x32.v/mem_1r1w_256x32). The user should instantiate that name in his or her RTL. This file then instantiates the real physical memory (i.e., FPGA_MEM_NAME or ASIC_MEM_NAME). Different copies of the wrapper file (mem_1r1w_256x32.v) can be used to instantiate different vendors’ memory instances. As a result, the original RTL doesn’t need to be changed in order to change the memory. Only the memory wrapper needs to be altered. It's always better to not change the functional RTL files for non-functional reasons, such as a technology port in which memory names change but functionality remains the same. The wrapper also can be used to invert signals in which different underlying memories have different active low/high signals. Another way to separate the logical memory instantiation from the physical implementation is through the use of compile switches. Let’s examine other functional-level considerations. Logic should be designed synchronously, for example, with a single rising-edge clock driving all flip-flops. This approach avoids many tool-related issues that can otherwise occur during implementation. DDR and other blocks that require both edges of a clock to be used are obvious exceptions to this goal of a single rising-edge clock implementation. DDR logic must always be carefully designed regardless of the target platform technology. In many cases, a 180-degree, phase-shifted rising edge can be used as a substitute for a falling-edge clock. To ensure the portability of synchronous logic, it’s essential to guarantee that the clock structure is clearly defined and meets the criteria for the primary production technology. Using the reset scheme described above also will help to improve portability. Synchronizers should be put on any asynchronous incoming logic. They also should be used on internal signals that change clock domains. If it is placed in a dedicated module, synchronization logic is usually easier to debug. Latches should be avoided at all costs. Ensure that all “if” statements have an “else” and that “case” statements have a “default” case. Otherwise, latches are inferred. Latches can cause problems with the back-end implementation of some implementation technologies. Combinatorial feedback loops should always be avoided. Technology-Specific Optimizations To increase performance, designs sometimes contain technology-specific optimizations. Such optimizations are usually in FPGA designs. But code that’s optimized for one technology is often not optimal for another. The direct instantiation of primitives can significantly reduce a design’s portability. Technology-specific primitives that are instantiated in RTL are one of two types: diffused or meta-primitives. Usually, diffused primitives are low-level dedicated blocks of diffused IP. Examples of diffused primitives include the dedicated multipliers that are available in many FPGA families. Meta-primitives are low-level blocks, such as FIFOs, that are compiled by dedicated tools into an implementation that’s efficient for a particular architecture. In each case, the primitive is directly instantiated in the RTL. Thus, using these primitives is a good way to improve the portability of a design. Typically, synthesis tools are very good at optimizing technology-independent RTL for the technology at which they’re targeted. This statement is especially true for the finer-grained technologies that are found with high-end platform and cell-based ASICs. If possible, avoid technology-specific optimizations. If these optimizations are included, they should be instantiated inside a wrapper. If the design is ported, the optimizations can then be replaced with equivalent, technology-independent RTL. To guarantee equivalence, verification or formal verification can be performed between the technology-specific and technology-independent versions. Sometimes, it’s difficult to compartmentalize such optimizations--especially where buses have been widened or pipelined. Compile switches, such as the Verilog ‘ifdef,’ can be used to implement both optimized and non-optimized versions of code in a single RTL file. To ensure equivalence, functional verification and possibly formal verification should be performed for both implementations. Intellectual Property A growing number of complex chip designs contain intellectual property (IP). If this IP is RTL-based, it is more easily ported across all implementation technologies. Non-RTLbased IP is usually less portable. In addition to high-level IP (typically entire subsystems), some vendors offer low-level primitives like the diffused primitives and meta-primitives mentioned earlier. Both types cause problems for design portability. Typically, meta-primitives provide an efficient implementation for a particular technology. In each case, the primitive is directly instantiated in the RTL. The RTL instantiation of low-level primitives should be avoided if possible. If such primitives are used, they should be made switchable by the use of compile switches or wrappers. If they are directly instantiated in a design, significant effort will be required to migrate that design. When porting high-level IP, the design must use special care. For example, non RTLbased IP typically isn’t portable between implementation technologies. Usually, replacement IP is needed that has been specifically designed for the new implementation technology. Consider the challenges in porting a DDR interface that has implementationtechnology-specific I/Os. These I/Os are usually different among vendors. Frequently, the I/Os are tightly woven into the IP, which makes it difficult to port even if most of the IP is RTL. In some cases, it can be easier to replace the entire subsystem than to migrate the high-level IP. If a migration activity is known at the start of a design cycle, a wrapper or compile switch in the RTL can be used to select between two IP instantiations. This approach allows the design to be easily migrated at a later point. Design migration is an increasingly important issue for design teams to consider. As design complexity and cost pressures increase, the need for easy design migration is growing. By adopting a technology-independent mindset during initial implementation, design teams will save time and reduce risk when they later perform a migration. By paying close attention to hierarchy, clocking, IP, I/Os, and three-state signals, designers can avoid being locked into a single implementation technology. Embracing portable design practices can yield competitive advantages. It also can help to ensure future product success by simplifying a cost-reduction path. Greg Martin is a Senior Product Applications Engineer for LSI Logic's RapidChip Technology Marketing Division. For the past eight years, he has worked in a variety of engineering and marketing positions at LSI Logic. He received a MENG in Microelectronic Systems Engineering from UMIST, Manchester, U.K. REFERENCES: [1] Reuse Methodology Manual for System-On-A-Chip Designs by Michael Keating, Pierre Bricaud. [2] www.sunburst-design.com/papers/CummingsSNUG2002SJ_Resets.pdf by Cliff Cummings +++++++++++++ Captions: Figure 1: This illustration shows how I/O signals can be brought out to the functional top, as separated from the device top level of a design. Figure 2: Memory wrappers allow physical mapping to be changed without RTL modification.