White Paper Convey PDK FPGA Design Practices

advertisement

White

Convey

PDK FPGA Design

Practices

Overview

There are many considerations involved in producing an effective FPGA design. Steps can be taken from initial design through coding that can improve the design. This white paper examines some basic AE FPGA design techniques that will help in successfully implementing a PDK design along with tips for timing closure. Examples of the techniques are also provided, using the example PDK where possible.

This paper is intended for FPGA designers implementing custom personalities using the Convey

PDK.

Design Partitioning

A design that is partitioned well simplifies coding, synthesis, simulation and floor planning. It also can reduce runtime and helps to maintain repeatable results when making design changes. Individual partitions can be taken through the entire process, allowing quick turns. This section refers to partitioning of RTL/source code and should not be confused with Xilinx partitions.

Partitioning Guidelines:

Register all outputs. This keeps all block synchronous, and ensures that the critical paths are contained inside of a logic group, eliminating possible problems with logic optimization across design boundaries.

Keep related logic in the same block. This allows sharing of common functions and allows optimization of the entire critical path in a single operation.

Separate logic with different optimization goals. This allows you to apply different optimization strategies for each module without the modules being limited by one another.

Keep instantiated code in separate blocks. This facilitates swapping of code.

Minimize interconnect between modules to minimize routing congestion.

For high speed designs, floorplan blocks to promote data flow between blocks, using Plan

Ahead.

Consider block size. Small blocks allow more control, but may not produce the most efficient design, since the small blocks may not have enough logic to apply resource sharing algorithms. Larger blocks decrease repeatability, since changes could affect more logic.

Create blocks that encapsulate distinct functionality, keeping it to a size that is convenient for debugging.

Partitioning Example:

Figure 1 shows the design partitioning of the Application Engine FPGA (AE) in the Convey

coprocessor, with the sample PDK personality.

1

Convey PDK FPGA Design Practices

AE FPGA

Dispatch

I/F

Memory

I/F

CAE

(PDK Sample

Personality)

AE – AE

I/F

Mgmt

I/F

CSR

Cae

Clock

Inst Dec

Vadd

(x16)

Figure 1 – Design Partitioning of the AE FPGA with the Sample PDK Personality

In the AE FPGA all partitions with the exception of the CAE are supplied in the Convey PDK and the interfaces are documented in the Convey PDK Reference Manual . The custom personality in the

CAE may be further partitioned to gain the benefits of design partitioning. Each module instance can have independent constraints. Convey provides all constraints for the Convey supplied partitions. Constraints for the custom personality may be added. More detailed information on constraints is available in the Xilinx Constraints Guide (UG625) .

Pipelining

Pipelining can improve design performance by breaking a long data path up over multiple clock cycles. This relaxes the clock to output and setup time requirements between registers, allowing for faster data paths. Care should be taken in the rest of the design, when adding pipelining stages, since latency is added to the path in which registers are added.

Pipelining Example:

2

Convey PDK FPGA Design Practices

Reg

Before Pipelining

Comb

Func

1

Comb

Func

2

Comb

Func

3

Reg

Reg

Comb

Func

1

Reg

Comb

Func

2

Reg

Comb

Func

3

Reg

After Pipelining

Figure 2 - Pipelining Example

Figure 2 illustrates the addition of registers to a design that has multiple levels of combinatorial logic

between synchronous elements.

Reset

Always use synchronous set/resets.

Two synchronous reset signals are available to the custom personality.

Signal Name i_reset i_csr_reset_n

Type Description input Synchronous, active high power up reset input Synchronous, active low CSR reset

The i_reset signal is on a global clock network (connected through a BUFG global clock buffer).

This signal can be connected directly to the reset input of flip-flops, or it can be locally fanned out through registers

If it is registered locally, it should be connected to the set or reset input of the flip flop (not the D input). The Xilinx tools sometimes have issues with global wires attached to D inputs resulting in routing issues, so the D input is generated locally.

The recommended reset implementation is used in the sample PDK application and is shown below.

FDSE rst (.C(clk),.S(reset),.CE(r_reset),.D(!r_reset),.Q(r_reset));

3

Convey PDK FPGA Design Practices

!r_reset

r_reset clk

D

CE

S

Q r_reset reset

Figure 3 – Local Reset

The local reset flip flop should be included in the block in which the local reset is used and placed with the block.

Coding Style

Coding style has a considerable impact on how an FPGA design is implemented and ultimately how it performs. Code should describe hardware structures to be implemented on the chip, rather than be a list of instructions for a chip to follow. Code that makes sense in a procedural language often does not result in the expected results in hardware. This is especially true in for-loops, if statements and case statements, since these unwrap into logic.

For Loops

The following is an example of logic resulting from a for loop.

Task: Count to 200 by 2’s

VHDL code: process(input)

Variable A: integer; begin

A: = input; for i in 1 to 100 loop

A:= A + 2; end loop; output <= A; end process;

Results: This code results in serial logic which counts by two each time i is incremented,

resulting in a chain of one hundred adders as shown in Figure 4.

4

Convey PDK FPGA Design Practices

A

2

+ A+2

2

+ A+4

2

A+6 +

2

+

. . .

A+198

A+200 +

2

Figure 4 – For Loop Implementation

The For Loop results in a large amount of combinatorial logic, with a very long critical path, but it executes in one clock.

Another way of implementing this task is shown below:

VHDL code: process(clk, rst) if rst = ‘1’ then

A<= 0; else if clk’event and clk = 1 then if (A < 200) then

A <= A + 2; end if; end if; end process;

Results: This design uses a register to store the value of A. When a rising edge of the clock occurs, the output of the register is incremented by two, and fed back to the input of the register. The register’s output is compared against 200 and the comparison is used to enable

the register. This design is illustrated in Figure 5.

5

Convey PDK FPGA Design Practices

2

+

A clk

200

>

Figure 5 – Alternate For-Loop Implementation

This implementation results in a much smaller amount of synchronous logic, reducing the critical path. This logic executes in 200 clocks.

Both implementations could work. The FPGA designer should be mindful of the type of logic that could result from the code that is written.

If-Then-Else Statements and Case Statements:

Case and if-then-else statements are commonly used in HDL designs. The if-then-else statement generates priority encoded logic, while the case statement generates balanced logic.

The following is an example of a function implemented using an if-then-else statement and a case statement.

The following if-then-else statement if SEL == 00” then Y = A; elseif SEL == 01” then Y = B; elseif SEL == 10” then Y = C; else Y = D; end if;

results in a string of 2:1 multiplexers as shown in Figure 6.

6

Convey PDK FPGA Design Practices

D

C

B Y

SEL = 00 A

SEL = 01

SEL = 00

Figure 6 - Synthesized Nested if-then-else Statement

Below is a case statement implementation of the same function: case SEL;

00”: Y = A;

01”: Y = B;

10”: Y = C; otherwise: Y = D; end case;

the case statement code results in a 4:1 multiplexer as shown in Figure 7.

A

B

C

D

Y

SEL

Figure 7 - Synthesized case Statement

Unintentional Latches

7

Convey PDK FPGA Design Practices

Synthesis tools infer latches from incomplete conditional expressions such as an if-then-else statement without an else clause or case statements that are not fully defined. Unintentional latches can be avoided by specifying all conditions or specifying a default assignment.

Most synthesis tools provide warning messages whenever these latches are detected, so checking the warning list after synthesis can save time later in the process.

Timing Closure

Using the design methods described above will help avoid timing problems, but they sometimes still occur. There are many methods to eliminate timing problems. Sometimes a problem can be eliminated by changing a setting in the tools, such as setting the synthesizer to optimize more for speed than area, or increasing the place and route effort, but these do not actually fix the problem and it may come back later. It is best to fix the problem in the source code or by adding a constraint, since these permanently eliminate the problem.

The following are some techniques used to fix timing problems.

Add a pipeline stage. Break up a long combinatorial path with the addition of a pipeline stage. If the synthesizer has the ability to move registers during timing optimization, pipeline stages can be added back to back and the synthesizer will register balance, moving the

registers into the combinatorial path. See the Pipelining section of this paper for more

information.

Add multi-cycle constraints. If the offending timing path actually has multiple cycles to execute, a multi-cycle path constraint can be used to constrain the path to the actual required timing. For example, if an output is actually needed every two 100 MHz clocks, then the place and route tools can be instructed, through the use of a multi-cycle constraint, to optimize the path for two clock cycles or 20 nsecs for this example. Note: Be sure to understand the multi-cycle constraint requested. Poorly specified multi-cycle paths can create problems that are difficult to debug.

Use duplication for overloaded nets. For nets that have a high fanout, duplicating the source of the net both reduces the delay and helps in routing to different areas of the chip. When using duplication, the keep constraint should be used to keep the tools from removing duplicated registers.

Implement repeaters, using registers for long runs.

Floorplan the design, using Xilinx PlanAhead. Using PlanAhead, groups of logic (usually defined in the design partitioning, but they don’t have to be) are placed within regions on the die.

Other Resources

The Xilinx web page is a good source for information concerning FPGA design. The following documents provide additional information on the topics covered in this paper.

8

Convey PDK FPGA Design Practices

Xilinx WP231 HDL Coding Practices to Accelerate Design Performance

Xilinx WP272 Get Smart About Reset

Xilinx WP361 Maintaining Repeatable Results

Xilinx UG625 Constraints Guide

Xilinx UG626 Synthesis and Simulation Design Guide

The Sunburst Design web page also contains papers on a variety of topics. http://www.sunburst-design.com/

9

Download