Designing for 100+ MHz 1 Designing for 100+MHz 1999 Designs Demand... Higher system speed Higher integration — smaller size, less power, better reliability Lower cost Shorter development time Better product differentiation 2 Designing for 100+MHz Traditional Multi-Chip Boards Discrete design components — CPU, memory — bus transceivers, PCI controller, FIFOs — Ethernet controller, Graphics accelerator, MPEG, DSP, etc. — programmable logic as glue and custom function Advantages: — well-documented sophisticated functions — readily available as IP in silicon 3 Designing for 100+MHz Multi-Chip Board Problems Physical size Power consumption and reliability PC board signal integrity Limited flexibility — prevents design modifications and upgrades — prevents product diversification — prevents product customization Poor product differentiation — standard parts = standard architecture 4 Designing for 100+MHz FPGA Advantages Smaller size Lower power consumption Better signal integrity — fewer PC-board issues Enhanced flexibility — easy modifications, upgrades, etc. Enhanced product differentiation — proprietary architectures 5 Designing for 100+MHz FPGAs Users Want... System clock rate of 100+ MHz >100,000 gates Efficient design methodologies Availability of well-documented Cores Reasonable cost 6 Designing for 100+MHz The FPGA Solution 4th Generation FPGA Logic+Memory+Routing Multi-Standard Select I/O Temperature Sensing Delay-Locked Loop for Fast Clock and I/O 3.3 ns Synchronous Dual-Port SRAM 500 Mbps SelectMAP Configuration 7 Designing for 100+MHz Now the Challenge... Design a 100+ MHz system Together, we can do it... — we’ll supply the ingredients... — you use them intelligently But don’t forget... — the clock period is less than 10 ns ! 8 Designing for 100+MHz Designing for 100+ MHz. Volts, Amps, and Watts — PCB signal distribution — chip inputs and outputs — power and thermal considerations Ones and zeros — logic emulation Bits and bytes — memory hierarchy 9 Designing for 100+MHz Moore Meets Einstein 2048 1024 Trace Length MHz 512 256 128 64 32 16 8 Clock Frequency Inches per 1/4 Clock Period 4 2 1 ’65 ’70 ’75 ’80 ’85 ’90 Year ’95 ’00 ’05 ’10 Speed Doubles Every 5 Years… ...But the speed of light never changes 10 Designing for 100+MHz Volts, Amps, and Watts PCB design issues — capacative loading — transmission lines and termination Chip inputs and outputs — clock distribution and DLLs — I/O standards Power and thermal considerations — temperature sensing diode — power supply decoupling Configuration — new SelectMAP mode 11 Designing for 100+MHz Capacitive Loading Capacitance slows outputs and increases power — output delay increase: – ~ 25 ps per pF of additional loading — output power dissipation increase: – 11 µW per MHz per pF with 3.3-V swing Sources of capacitance — 10 pF max for each device pin — 2 pF per inch for narrow traces ( 0.8 pF/cm ) — 130 pF per inch2 for copper areas ( 20 pF/cm2) IBIS files provide output impedance details 12 Designing for 100+MHz Transmission Lines Some traces must be treated as transmission lines to minimize ringing — transmission line if round trip > transition time — lumped-capacitance if round trip < transition time Signal delay on a PCB: — 140 to 180 ps per inch ( 50 to 70 ps/cm) Lumped-capacitance trace length: — 3 inches max for a 1-ns transition time (7.5 cm) — 6 inches max for a 2-ns transition time (15 cm) 13 Designing for 100+MHz Terminated Transmission Lines Reflections and ringing Traditional Thevenin termination at the end VCC 100 Ω 50 Ω 100 Ω Dynamic termination at the end is better and saves power 50 Ω 50 Ω 100 pF Series termination at the source is best single source and destination only! 22 Ω 27 Ω 50 Ω (50 Ω Total) 14 Designing for 100+MHz On-Chip Clock Distribution Clock CLB Data IOB Clock distribution introduces delay — larger chips suffer more clock delay 15 Designing for 100+MHz Clock Delay Problems Clock delay increases clock-to-output times Clock delay leads to unacceptable input hold time — set-up time is negative Additional data delay can eliminate the hold time — set-up time becomes positive — but tolerance build-up widens the data-valid window IOB Flip-Flop Data Clock Delay Clock Distribution Delay D Q Clock Required Data Valid (without delay) Required Data Valid (with delay) 16 Designing for 100+MHz DLLs Maximize I/O Speed Clock-to-output time plus set-up time determines the I/O speed and data bandwidth — min clock period = max clock-to-out + max set-up Traditional solution: — use highly buffered, balanced clock trees – needed to reduce internal clock skew – cannot totally eliminate the delay The Virtex solution: — use a Delay-Locked-Loop ( DLL ) – aligns the internal and external clocks – effectively eliminates the clock-distribution delay 17 Designing for 100+MHz Virtex Has 4 Independent DLLs Clock Comparator Error Delay CLB Data IOB DLLs adjust clock delay to align internal and external clocks — digital closed-loop control — 25 to 200-MHz range, 35-picosecond resolution 18 Designing for 100+MHz Fast Clock-to-Out With DLL 160 MHz inter-chip data rate — 16-mA LVTTL — IOB register to IOB register Virtex FPGA Virtex FPGA 0.5 ns D Q DLL DLL 3.8 ns 1.9 ns Clock 19 Designing for 100+MHz LVTTL Data Rate with DLL 1.4 ns measured clock-to-output delay Output standard = LVTTL Fast 16mA (OBUF_F_16) Temp=100C, Vdd=2.375V, Vcco=3.3V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing w/o DLL w/ DLL r->r r->f r->r r->f 3.9n 3.9n 1.4n 1.4n 20 Designing for 100+MHz Other DLL Functions Double the incoming clock frequency — fast internal operation – slow external clock Clock mirroring to the PCB Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16 Adjust clock duty cycle to 50-50 Create four quadrature clock phases — input four sequential bits per clock period 21 Designing for 100+MHz Duty Cycle Correction ~25% duty cycle in – 50% duty cycle out Virtex FPGA 1X 25 MHz 25% Duty Cycle DLL 25 MHz 50% Duty Cycle 22 Designing for 100+MHz Clock Doubling and Mirroring Clock mirror with less than 100 ps skew — simplifies PCB clock distribution Virtex SDRAM 74 MHz #1 System Clock 37 MHz DLL 1 1 Input Load Zero-Delay Internal Clock Buffer 74 MHz #2 DLL 2 Actual HDTV Customer Example Exactly Aligned 74 MHz Internal 37 MHz Internal System Clock SDRAM Inside FPGA Inside FPGA 23 Designing for 100+MHz Precise Clock Mirroring 2x system clock for board use Virtex FPGA 2X 66MHz Clock DLL 132 MHz Clock 24 Designing for 100+MHz Clock Division Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16 — maintain synchronous edges CLKIn 200 MHz CLKout 200 MHz CLKDV 12.5 MHz 25 Designing for 100+MHz Multi-Standard SelectI/O GTL+ MicroProcessor 2.5V SSTL SRAM 1.8V SDRAM 5V Tolerant FLASH Mixed Signal 5V 3.3V LVTTL Busses/Backplanes (3/5V PCI, ISA, GTL…) DSP 26 Designing for 100+MHz Mix & Match Output Standards User-supplied voltages determine output swing — 3.3 V, 2.5 V, 1.5 V — one voltage per bank — a bank is half of a chip edge Output characteristics are programmable on a per-pin basis — push-pull or open-drain — LVTTL drive strength – 2-mA to 24-mA sink and source current — LVTTL Slew rate 27 Designing for 100+MHz Mix & Match Input Standards Internal Reference Internal or user-supplied threshold voltage — selectable on a per-pin basis — one user-supplied threshold voltage per bank Programmable over-voltage protection — 5-V tolerant or diode clamp to VCCO — selectable on a per-pin basis VREF Input Input Input Input Input Input VREF 28 Designing for 100+MHz SSTL Clock-to-Out With DLL 200 MHz inter-chip data rate — SSTL 3, Class II — IOB register to IOB register Virtex FPGA Virtex FPGA 0.3 ns D Q DLL DLL 2.8 ns 1.9 ns Clock (Stub Series Transceiver Logic) 29 Designing for 100+MHz SSTL Data Rate with DLL 1.3 ns measured clock-to-output delay — much lower noise than LVTTL Output standard = SSTL 3 Class 2 (OBUF_SSTL3_II) Temp=100C, Vdd=2.375V, Vcco=3.3V, Vtt=1.5V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing w/o DLL r->r r->f 3.5n 3.8n w/ DLL r->r r->f 1.1n 1.3n 30 Designing for 100+MHz From FPGA to System Component ‘Redefining the FPGA’ x1 CLK Chip 1 Cache SRAM (Mbytes) x2 CLK LVCMOS SSTL3 LVTTL Low Voltage CPU GTL+ SDRAM (133MHz) Chip 1 High Speed System Backplane "Virtex moves FPGAs from glue to system component” - Ron Neale, EE 31 Designing for 100+MHz Power and Thermal Issues Power and heat are serious concerns All CMOS power consumption is dynamic — proportional to VCC2 — proportional to capacitance — proportional to frequency Virtex conserves power — 2.5-V supply voltage — small geometries and short interconnects reduce capacitance 32 Designing for 100+MHz Virtex Power Consumption Virtex is designed to conserve power — 100 MHz 16-bit counters – 12.5 MHz average transition rate – 6.5 mW per counter including clock distribution — 100 MHz 8-bit counters – 25 MHz average transition rate – 5 mW per counter including clock distribution XCV300 XCV1000 384 16-bit Counters 2.5 W Total 768 8-bit Counters 3.7 W Total 1536 16-bit Counters 9.8 W Total 3072 8-bit Counters 14.7 W Total 33 Designing for 100+MHz Thermal Management Temperature-sensing diode — matched to maxim MAX 1617 A/D — programmable alarms — similar to the Pentium II solution Virtex FPGA SBMCLK DXP DXN Maxim MAX1617 SBMDATA ALERT 34 Designing for 100+MHz Power Supply Decoupling CMOS power-supply current is dynamic — current pulse every active clock edge Peak current can be 5x the average current — instantaneous current peaks can only be supplied by decoupling capacitors Use one 0.1 µF ceramic chip capacitor for each power-supply pin — low L and R are more important than high C — double up for lower L and R if necessary — use direct vias to the supply planes, close to the power-supply pins 35 Designing for 100+MHz Virtex Configuration New byte-wide SelectMAP mode — up to 528 Mbps at 66 MHz Control Logic (EPLD) Busy – simple handshake protocol — up to 400 Mbps at 50 MHz CS Address Configuration EPROM – no handshake required Configuration bit-stream length — 0.5 Mbits to 6.1 Mbits Data WE, CS Virtex FPGA 36 Designing for 100+MHz Volts, Amps, and Watts: Recap PCB design issues — minimize capacitance for higher speed — terminate transmission lines to reduce ringing Chip inputs and outputs — use DLLs to maximize I/O bandwidth — use SelectI/O to interface with different standards Power and thermal considerations — use the sensing diode to manage chip temperature — decouple the power supply well Configuration — configure faster with the SelectMAP mode 37 Designing for 100+MHz Designing for 100+ MHz. Volts, Amps, and Watts — PCB Signal Distribution — chip Inputs and Outputs — power and Thermal Considerations Ones and zeros — logic Emulation Bits and bytes — memory hierarchy 38 Designing for 100+MHz Spending the 10 ns Budget Fast logic requires fast function generators — signals often pass through several function generators Routing delays must also be kept short — there are routing delays between every function generator Arithmetic delays are important — carry chains often create critical paths 39 Designing for 100+MHz You Don’t Have To Be An Expert You don’t have to be an FPGA architecture expert to implement high-performance designs — the benefits of a good architecture are automatic – all the logic goes faster – software provides easy access to the features You can achieve high-performance only with a good FPGA architecture — a good FPGA empowers its users You’ll design better if you know the architecture — matching your design style to the available features increases performance and/or lowers cost 40 Designing for 100+MHz Virtex CLB Logic and arithmetic delay reduction demands improvements in the CLB Virtex CLB is divided into two slices, each with: – 2 function generators – 2 flip-flops – 2 bits of carry logic Fnct Gen Fnct Gen Carry Carry Fnct Gen Fnct Gen Carry Carry 41 Designing for 100+MHz Fast Function Generators Each function generator emulates 2 to 3 levels of logic — a 10-level logic path typically requires 3 to 5 Function Generators in series — at 100 MHz, they must be less than 2 ns each including the routing Virtex has 0.6-ns function generators — leaves 1.4 ns for each route 42 Designing for 100+MHz Connecting Function Generators Some functions need several function generators — F5 MUXs connect pairs of function generators – functions with 5 to 9 inputs — F6 MUXs connect all 4 function generators – functions with 6 to 17 inputs Fnct Gen Fnct Gen F5 F5 Fnct Gen Fnct Gen F6 43 Designing for 100+MHz Fast Local Routing Local routing provides fast interconnects — in a CLB, Function Generators connect with minimal routing delays — fast paths between adjacent CLBs increases flexibility Fnct Gen Fnct Gen Fnct Gen Fnct Gen Carry Carry Carry Carry Fnct Gen Fnct Gen Fnct Gen Fnct Gen Carry Carry Carry Carry 44 Designing for 100+MHz Use Pipelining for Speed Shorter clock periods means doing less each period — — — — create a pipeline structure pipeline stages operate concurrently more functions are done at the same time throughput increases All function generators have output flip-flops — most pipeline support is “free” 45 Designing for 100+MHz 16-Bit Pipeline in One LUT In directly cascaded pipelines the flip-flops are not free Delay 16-Bit Shift Register Select One SRLUT can implement up to 16 bits of delay — shift data in and select the appropriate tap Output Input 46 Designing for 100+MHz Fast Logic Needs Fast Routing Our typical design with 3 to 5 CLBs needed an average routing delay of 1.4 ns or less Vector-based Interconnect — the Virtex routing architecture delivers this performance Delay is independent of direction — dependably short delays The circles show 1.4-ns routing regions 47 Designing for 100+MHz Go Farther, Faster Virtex achieves its speed through a hierarchy of highly buffered routing resources — wires span 1, 2, or 6 CLBs The Virtex routing architecture is designed for large arrays — today’s FPGAs are big… but tomorrow’s will be even bigger Virtex is designed to maintain its performance even in very large arrays 48 Designing for 100+MHz No Routing Congestion For high-speed applications, routing must be dependably fast — not just capable of being fast In the past, high device utilization has caused routing congestion — critical nets might be forced to meander Virtex minimizes these problems — abundant resources prevent congestion If it needs to be fast, it will be fast – automatically! 49 Designing for 100+MHz Built-in Tri-State Busses Bi-directional busses are supported directly by tri-state buffers built into each CLB — two drivers per CLB — segmentable every four CLB columns CLB CLB CLB CLB CLB 50 Designing for 100+MHz Arithmetic – A Special Case Adders, accumulators, counters, and comparators all depend on carry chains Carry-chain logic is usually much deeper than the rest of the design — 32 levels for a 16-bit ripple adder — too deep to use function generators at 100 MHz — arithmetic delays would limit performance Dedicated carry logic provides the desired speed — 16-bit adders can operate at up to 200 MHz register-to-register 51 Designing for 100+MHz Wide Arithmetic 64-bit adders would require 128 levels of logic — expensive complex carry schemes would be needed to preserve performance Virtex minimizes the carry propagation delay — 100 ps per bit pair — zero routing delay between CLBs Minimal performance loss for each extra bit 16-bit adders operate at up to 200 MHz 64-bit adders operate at up to 135 MHz 52 Designing for 100+MHz Efficient Virtex Multipliers Cascade vs. tree structure Delay — cascade simpler and smaller — tree is faster Cascade Tree Virtex Tree Virtex gives the best of both worlds — as fast as a tree — smaller than a cascade Number of CLBs 160 MHz clock rate for pipelined 16 x 16 multiplier 4x4 8x8 16 x 16 Cascade Tree Virtex Tree 4x4 8x8 16 x 16 53 Designing for 100+MHz Fast Address Decoders Wide address decoders could slow operation — wide AND gates with invertable inputs Virtex carry-chain MUXs can act as AND gates — combine function generator ANDs 64-bit decoders operate at up to 155 MHz 0 1 0 0 1 0 0 1 0 0 1 0 1 54 Designing for 100+MHz Speed Is Never Wasted You can never have too much performance — excess performance can always be traded for size and cost reduction Replace single-cycle functions with smaller multi-cycle versions — a 2-cycle multiplier is half the cost of a single-cycle multiplier Reduce costs by designing down to the performance you need 55 Designing for 100+MHz Creating a High-Speed Clock Logic sometimes needs to operate faster than the available clock — multiple RAM accesses in a single cycle — low-speed PCB clock distribution for power or noise reduction Virtex DLLs can double and redouble incoming clocks 45 MHz 2X 2X DLL1 DLL2 90 MHz 180 MHz 56 Designing for 100+MHz Optimized for the Future Deep sub-micron technology permits larger and larger array sizes — poses new circuit-design challenges — changes the rules of FPGA architecture Across-chip routing is the most vulnerable — could easily limit design performance Virtex is designed for long-term growth — even long, across-chip routes will remain fast Virtex is tomorrow’s FPGA … today! 57 Designing for 100+MHz 10 ns is Long Enough Virtex CLBs can implement relatively complex functions in 10 ns — 0.6 ns per 4-input function generator Virtex offers fast interconnections — even across-chip when fully utilized — fast tri-state buses Support for very fast arithmetic operations — 16-bit adders at 200MHz 58 Designing for 100+MHz Implement Designs Automatically You don’t have to be an FPGA wizard to use Virtex Virtex is optimized for automated implementation — uniform structure – efficient mapping/synthesis — ample routing – simple placement and no congestion — predictable performance – effective synthesis IP cores speed design even more — validated functionality with guaranteed performance 59 Designing for 100+MHz Designing for 100+ MHz Volts, Amps, and Watts — PCB signal distribution — chip inputs and outputs — power and thermal considerations Ones and zeros — logic emulation Bits and bytes — memory hierarchy 60 Designing for 100+MHz 100+ MHz Memory Virtex memory operates up to 200 MHz High-speed memory has two benefits — data storage – “work-in-progress” – input/output buffers, FIFOs — accelerating complex functions – store pre-computed values in look-up tables 61 Designing for 100+MHz Data Storage Hierarchy Virtex supports 3 levels of memory hierarchy On-chip SelectRAM+ — small-to-medium memories — 0.6-ns read access time On-chip Block SelectRAM+ — larger memories — true dual-ported operation — 3.3-ns read access time Fast SelectI/O interfaces to external RAM — DLL boosts memory bandwidth 62 Designing for 100+MHz SelectRAM+ SelectRAM+ uses CLB LUTs as user memory — — — — 16-deep RAMs 32-deep RAMs 16-deep dual-ported RAMs 16-deep shift registers Cascadable for larger memories — 128 or more words deep — uses logic resources for expansion 63 Designing for 100+MHz Block SelectRAM+ Up to 32 dual-ported 4096-bit RAM Blocks — synchronous read and write True dual-port memory — each port has full read and write capability — different clocks for each port Configurable aspect ratio — trade width for depth – 4096 x 1 bit to 256 x 16 bits — separate configurations for each port Dedicated routing for memory expansion 64 Designing for 100+MHz High-Speed Memory Interfaces SelectI0 and DLLs together provide fast access to many types of external memory Xilinx currently offers two reference designs — fully synthesized — automatic placement and routing SDRAM … up to 125 MHz ZBTRAM … up to 143 MHz (Zero Bus-Turn-around) 65 Designing for 100+MHz Input/Output Data Buffers High-performance systems need data buffers to decouple internal operation from I/O activity — I/O may be sporadic (burst-mode busses) — I/O may be faster or slower — I/O may be wider or narrower I/O buffers can take several forms — dual-ported RAMs — ping-pong buffers — FIFOs 66 Designing for 100+MHz Dual-ported I/O Buffers Block SelectRAM+ is ideal for I/O buffers — dual-ported operation – independent clocks and controls – bridges between clock domains – simultaneous read and write — port-specific aspect-ratio control – built-in rate/width conversions SelectRAM+ provides similar benefits on a smaller scale 67 Designing for 100+MHz Ping Pong Buffers Ping-pong buffers are pairs of blocks that alternate between input and processing SRLUT for small buffers Read Address 16-Bit Shift Register — self-addressing input — 0.6-ns read access { Larger buffers can use the dual-ported Block RAM Output { 16-Bit Shift Register — one address bit alternates read/write areas — 3.3-ns read access Select Input 68 Designing for 100+MHz Small FIFOs in SRLUTs Small FIFOs can be implemented in SRLUTs word count addresses the output data increment and enable SRLUT to Push decrement to Pop Pop enable only for both Down 16-Byte FIFO in 4 CLBs — 16 x 16 in 6 CLBs — 200+ MHz Expandable for deeper FIFOs Word Counter { Up Push 16-Bit Shift Register — — — — Output Input 69 Designing for 100+MHz Large FIFOs in Block RAM — add read and write address counters Asynchronous push and pop Data En Full Push Block SelectRAM+ Addrs Output Data Addrs WE Counter Input Counter Large FIFOs can use the dual-ported block RAM En Control Logic Pop Empty Different port sizes give rate-for-width conversion Block RAM FIFOs can operate at up to 170 MHz including flag logic 70 Designing for 100+MHz Pre-computing for Speed Some functions are too complex for 10-ns logic implementation — pipelining is not always possible An alternative is to pre-compute all the possible results and store them in memory — select a result according to the inputs Function time is independent of complexity — 0.6 ns SelectRAM+ access time — 3.3 ns Block SelectRAM+ access time The function table can be smaller than the logic 71 Designing for 100+MHz Multiplication By A Constant Sometimes, data has to be “scaled” — multiplied by a constant value A full multiplier is too expensive — it can multiply by a variable — unnecessarily general and too complex Storing all multiples of the constant is a better alternative Constant Input Input Multiplier Array Product Table Scaled Data Scaled Data — smaller and much faster 72 Designing for 100+MHz 16-bit Scaler A 216-word product table is impractical — partition the input into nibbles – use 16-word LUTs for nibble products – combine the partial products in adders Roughly half the CLBs of a full multiplier — for a 16-bit Coefficient: 36 CLBs vs. 62 CLBs Pipeline the adders for extra speed Input LUT x4096 LUT x256 LUT x16 Scaled Data LUT 73 Designing for 100+MHz Changing the Constant The SRLUT mode can be used to update the table — “push-only” stack — last 16 bits loaded define the table Constant Register Input { 16-Bit Shift Register A simple accumulator computes all products of a new constant Output Register Clear Load Change Constant 74 Designing for 100+MHz Large Function Tables Larger functions can be implemented in the Block SelectRAM+ — 12-input functions — micro-coded state machines Data tables can also be implemented — sine/cosine tables for DSP, for example — dual-ported access gives the sine and cosine simultaneously — a simple address offset gives 90º phase shift for accessing sine and cosine from a single table 75 Designing for 100+MHz Block RAM/ROM Creation CORE Generator software creates RAMs and ROMs — simple GUI interface Initialization file is loaded into RAMs and ROMs at configuration time 76 Designing for 100+MHz Memory Summary Virtex has two kinds of internal memory — distributed SelectRAM+ for small RAMs — Block SelectRAM+ for larger RAMs SelectRAM+ — — — — 0.6 ns read access time 16- and 32-word RAMs 16-word dual-ported RAMs 16-word shift registers – sequential write/random-access read – FIFOs, pipelining, LUT functions, etc... 77 Designing for 100+MHz Memory Summary Dual-ported 4096-bit Block SelectRAM+ — 3.3 ns read access time — true dual-ported operation – both ports are read/write – ports can be clocked asynchronously — configurable aspect ratio – 4096 x 1 bit to 256 x 16 bits – configure ports differently for width/rate conversion High-speed SelectI/O access to external RAM 78 Designing for 100+MHz Designing for 100+ MHz Volts, Amps, and Watts — DLLs and flexible I/O standards — fast inter-chip communication — simple rules for good signal integrity Ones and zeros — fast logic and fast interconnect — dependable high performance Bits and bytes — distributed SelectRAM+ — dual-ported Block SelectRAM+ 79 Designing for 100+MHz The Virtex Family XCV50 XCV100 XCV150 XCV200 XCV300 XCV400 XCV600 XCV800 XCV1000 System Gates 57,906 108,904 164,674 236,666 322,970 468,252 661,111 888,439 1,124,022 Logic Cells 1,758 2,700 3,888 5,292 6,912 10,800 15,552 21,168 27,648 Block RAM 32 Kb 40 Kb 48 Kb 56 Kb 64 Kb 80 Kb 96 Kb 112 Kb 128 Kb CS144 94 94 TQ144 PQ/HQ240 94 164 94 164 164 164 164 164 164 164 BG256 BG352 BG432 BG560 180 180 260 260 260 316 316 404 316 404 316 404 404 FG256 FG456 FG600 FG680 176 404 404 500 404 514 514 User I/O 176 176 260 176 284 312 The complete Virtex Data Sheet is on your AppLinx CD-ROM and at www.xilinx.com/partinfo/virtex.pdf 80 Designing for 100+MHz Designing for 100+ MHz 81 Designing for 100+MHz