Field Programmable Gate Array (FPGA) Layout • An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip depending on the size of the FPGA • Each CLB represents almost any 4-input section of combinational or sequential logic • By programming the CLBs and interconnections large circuits can be represented in the FPGA • One Xilinx XC2V4000 FPGA can represent a circuit up to 1M gates DINI DN3000k10 Board • DINI DN3000k10 is an FPGA based PCI card • Contains five Xilinx XC2V4000 FPGAs connected by a 226 bit wide bus • One of the FPGA has a separate connection for communicating to a PC via the PCI bus • FPGAs can be configured through the PCI bus or configurations can be stored on board Multiplier Options Array Multiplier (Speed) Array Multiplier (Usage) 160 0.07 140 0.06 120 100 0.04 80 0.03 60 0.02 40 0.01 20 0 0 4 8 16 32 Precision (Bits) *Percentage of CLBs used in a XC2V4000, the XC2C4000 contains 5760 CLBs Usage (% CLBs)* Max Freq. (MHz) 0.05 Cellular Automata Model Multiple Cells per Processing Element Number of Processing Elements vs. Million Operations Per Second 1600 Cell Updates Per Second (Million) t 1400 1200 1000 8 Bit Model 16 Bit Model 800 600 400 200 0 9 10 11 12 13 14 15 16 17 Number of Processing Elements 18 19 20 Comparison to C-Simulation • Based on timing of optimized C simulation running on CCM1 (1.4GHz, 1GB RAM) • Simulator ran 1M iterations of a 5448 cell beam model in 111.4s - Million Cell Updates per Second: 48.9 • For a 5448 cell model at 16 bit precision the FPGA implementation could perform 7497M Cell Updates per Second • Based on these results the FPGA was ~150 times faster than the optimized C-simulation Other FPGA Applications • HokieGene – Genome Matching Project (2003) – – – • Matching engine executed on one FPGA (XC2V1000) Performed 200 billion cell updates per second 1,200 billion operations per second (1.2 GOPS) BYU - Network Intrusion Detection Systems (2002) – – Hardware implementation used one FPGA (XC2V1000) Outperformed software version running on P3 – 750MHz: • • • Xilinx – High Performance DES Encryption (2000) – – – • Up to 400 times more throughput than software version Up to 1000 times less latency than software version Implemented on one FPGA (XCV150) Maximum throughput 10.75 GB/sec Outperformed best ASIC implementation Texas University – Target Recognition System (2000) – – – System built using one FPGA (ORCA 40k) and Myrinet interfacing Capable of processing 900 templates per second 2,800 billion operations per second (2.8 GOPS)