Field Programmable Gate Array (FPGA) Layout

advertisement
Field Programmable Gate Array
(FPGA) Layout
•
An FPGA consists of a large array
of Configurable Logic Blocks
(CLBs) - typically 1,000 to 8,000
CLBs per chip depending on the
size of the FPGA
•
Each CLB represents almost any
4-input section of combinational or
sequential logic
•
By programming the CLBs and
interconnections large circuits can
be represented in the FPGA
•
One Xilinx XC2V4000 FPGA can
represent a circuit up to 1M gates
DINI DN3000k10 Board
• DINI DN3000k10 is an
FPGA based PCI card
• Contains five Xilinx
XC2V4000 FPGAs
connected by a 226 bit
wide bus
• One of the FPGA has a
separate connection for
communicating to a PC
via the PCI bus
• FPGAs can be configured
through the PCI bus or
configurations can be
stored on board
Multiplier Options
Array Multiplier (Speed)
Array Multiplier (Usage)
160
0.07
140
0.06
120
100
0.04
80
0.03
60
0.02
40
0.01
20
0
0
4
8
16
32
Precision (Bits)
*Percentage of CLBs used in a XC2V4000, the XC2C4000 contains 5760 CLBs
Usage (% CLBs)*
Max Freq. (MHz)
0.05
Cellular Automata Model
Multiple Cells per Processing Element
Number of Processing Elements vs.
Million Operations Per Second
1600
Cell Updates Per Second (Million) t
1400
1200
1000
8 Bit Model
16 Bit Model
800
600
400
200
0
9
10
11
12
13
14
15
16
17
Number of Processing Elements
18
19
20
Comparison to C-Simulation
• Based on timing of optimized C simulation running on
CCM1 (1.4GHz, 1GB RAM)
• Simulator ran 1M iterations of a 5448 cell beam model in
111.4s - Million Cell Updates per Second: 48.9
• For a 5448 cell model at 16 bit precision the FPGA
implementation could perform 7497M Cell Updates per
Second
• Based on these results the FPGA was ~150 times faster
than the optimized C-simulation
Other FPGA Applications
•
HokieGene – Genome Matching Project (2003)
–
–
–
•
Matching engine executed on one FPGA (XC2V1000)
Performed 200 billion cell updates per second
1,200 billion operations per second (1.2 GOPS)
BYU - Network Intrusion Detection Systems (2002)
–
–
Hardware implementation used one FPGA (XC2V1000)
Outperformed software version running on P3 – 750MHz:
•
•
•
Xilinx – High Performance DES Encryption (2000)
–
–
–
•
Up to 400 times more throughput than software version
Up to 1000 times less latency than software version
Implemented on one FPGA (XCV150)
Maximum throughput 10.75 GB/sec
Outperformed best ASIC implementation
Texas University – Target Recognition System (2000)
–
–
–
System built using one FPGA (ORCA 40k) and Myrinet interfacing
Capable of processing 900 templates per second
2,800 billion operations per second (2.8 GOPS)
Download