2 Altera vs Xilinx (Scekic)

advertisement
Altera vs. Xilinx
Ognjen Šćekić
ogi@cg.yu
prof. dr Veljko Milutinović
vm@etf.bg.ac.yu
Ognjen Šćekić
1/103
Introduction
Ognjen Šćekić
2/103
FPGA vs. ASIC
FPGA = Field Programmable Gate Array
flexibility of software + speed of hardware
ASIC = Application Specific Integrated Circuits
tailor-made on demand for specific applications
Ognjen Šćekić
3/103
Market Overview
• Key players: Xilinx, Altera, Lattice, Actel
• PLD market estimated at $57 billion and rapidly growing
• The goal is to expand the market:
– by lowering per-unit cost to attack the low-end market
– by increasing speed capabilities to attack the high-end market
Figure 1 - PLD market share
Ognjen Šćekić
4/103
About Xilinx
• Pronounced "zylinks"
• Founded in 1984
• Employs around 2,600 people.
• Claims more than half the world demand for FPGAs.
• Partners with leading semiconductor manufacturers
such as IBM Microelectronics, UMC and Seiko.
• Xilinx is the net market leader at the moment
Ognjen Šćekić
5/103
About Altera
• Founded in 1983.
• Introduced look-up table based architecture in 1992
• Second greatest FPGA manufacturer
• Strategic partner is TSMC
Ognjen Šćekić
6/103
Recent FPGA Design Timeline
• Virtex and Stratix families are direct opponents, as are Spartan and Cyclone
Ognjen Šćekić
7/103
Key Factors For Comparing
FPGAs
• Fabrication process
• Logic density
• Clock management
• On-chip memory
• DSP capabilities
• I/O compatibility
• Software support & other design services
Ognjen Šćekić
8/103
Fabrication Process
• More advanced fabrication process brings higher integration
and thus higher density and/or reduced size of chip.
• Currently the most advanced is 90nm process (previously 0.13μm)
• first used in Spartan-3, and later in Virtex-4 FPGA family
• gave Xilinx one year lead over Altera
• Altera introduced it in 2004 with Cyclone II and Stratix II
Figure 2 - Cyclone II 90nm structure
Ognjen Šćekić
9/103
Logic Density
• We need a unit to express the logic capability of FPGA
• Is it possible to define such unit precisely?
• Traditionally:
Xilinx:
Altera:
LC – Logic Cell
LE – Logic Element
1 LC = 4-input LUT + D-FF + arithmetic/logic/register circuitry
1 LC = 1 LE
Ognjen Šćekić
10/103
Logic Density (2)
• Improved functionality of "new" architectures introduced new terms:
• ALM – Adaptive Logic Module
for describing Altera's Stratix II family's adaptable structure
• CLB – Configurable Logic Block
for describing Xilinx's FPGA families
• ELC – Equivalent Logic Cell
Xilinx's new unit to better express logic density
1 ELC = 1.125 LC
1 CLB has 8 LCs
Ognjen Šćekić
11/103
Clock Management
• All parts of a digital circuit need to be synchronized to a desired clock signal.
Clock management comprises two basic functions:
• If a circuit is large, complex, and operating at high frequencies
• remove
clock skew
and
propagation
delay
the
clock propagation
delay and
clock
skew have a great
impact on performance.
• Therefore,
providing
clock signal
with zero-delay in all parts of an FPGA
• generate
newa clock
signals
becomes crucial.
with different frequencies and/or phases
• The solution is to divide FPGA into regions that can work at different frequencies,
called clock domains.
Ognjen Šćekić
12/103
Removing Clock Skew
It can be done using:
• DLLs – Delay-Locked Loops (Xilinx)
• PLLs – Phase-Locked Loops (Altera)
Figure 3a - DLL block diagram
Figure 3b - PLL block diagram
They both compensate for the delay
generated on the routing network inside the FPGA,
providing zero-delay clock signal to different parts of FPGA.
Ognjen Šćekić
13/103
Delay-Locked Loop
DLL works
by inserting
delay
between
input
clock
• Delay-line
produces
a delayed
version
of thethe
input
clock
CLKIN.
and the
feedback
clock routes
until the
edges
align,
• Clock
distribution
network
the two
clockrising
to FPGA
interior
putting
two clocks
phase.
and
to thethe
feedback
CLKFBinpin.
• Control logic sample the input clock and the feedback clock
When the two clocks are in phase, the DLL "locks".
in order to adjust the delay line.
• Delay-line
on an clock
array of
delay elements,
Thus, theconsists
DLL output
compensates
for the delay
typically
CMOSdistribution
voltage-controlled
inverters connected in series.
in the clock
network.
Ognjen Šćekić
14/103
Phase-Locked Loop
• Instead of a delay line, the PLL uses a voltage controlled oscillator
which generates a clock signal that approximates the input clock CLKIN.
• Control logic, consisting of a phase detector and filter,
adjusts the oscillator frequency and phase
to compensate for the clock distribution delay.
• When the clocks are aligned the PLL "locks".
Ognjen Šćekić
15/103
PLL vs. DLL
PLL
DLL
Drawback:
Advantage:
Advantage:
Drawback:
oscillator accumulates phase error
frequency synthesis is easier
because of oscillator
does not accumulate phase error
frequency synthesis is more difficult
Altera uses PLLs and Xilinx uses DLLs.
Ognjen Šćekić
16/103
Clock Generation & Phase Shifting
• Beside clock skew elimination, DLLs (PLLs) are also used for:
• frequency multiplication and division
• duty-cycle regulation
• phase shifting
• Clock managers need to be resistant to temperature/voltage
variations.
Clock manipulation dramatically simplifies the design
and improves performance.
At the same time it provides many design alternatives.
Ognjen Šćekić
17/103
Embedded Memory
• Using LUTs as registers does not provide enough space or versatility.
• Time-dependent applications, performing many computations,
need an entire built-in memory.
• The main advantages of embedded (built-in) memory are:
• short access time
• high bandwidth
• great versatility
It can behave like:
•
•
•
•
•
•
Ognjen Šćekić
RAM
ROM
Buffer (FIFO, LIFO, etc.)
Cache
Shift registers
etc…
18/103
DSP Capabilities
DSP – Digital Signal Processing
• Majority of FPGA applications require some sort of DSP.
• In order to increase efficiency DSP
computations are executed in parallel - pipelining.
• Special DSP units have been developed
to fully exploit FPGA's adaptable structure.
• These units are designed to optimize execution
of commonly used DSP algorithms:
filtering, encoding/decoding, equalization, modulation, FFT, etc
• They usually contain:
multipliers (in parallel), accumulators, adders and shift registers
Ognjen Šćekić
19/103
I/O Compatibility
• As FPGAs continue to grow in size and capacity
more complex systems are designed for them,
demanding an increased variety of I/O standards .
The bus I/O standards provide specifications to other vendors
• Furthermore, as system-clock speeds continue to increase,
who create products designed to interface with these applications.
the need for high-performance I/O becomes more important.
Each standard often has its own specifications for:
I/O buffering
and termination
techniques.
• current,
Modernvoltage,
bus applications,
pioneered
by the most
influential companies,
are commonly introduced with a new I/O standard,
tailored specifically to the needs of that application.
Ognjen Šćekić
20/103
I/O Compatibility (2)
• Interfaces are implemented in I/O blocks.
• I/O blocks are parts of FPGA architecture positioned peripherally,
connected to I/O pins and to internal interconnects.
• I/O blocks are grouped into banks – a group of neighboring pins
which use the same or compatible I/O standard at the same time.
Ognjen Šćekić
21/103
I/O Compatibility (3)
• An I/O block usually contains:
 programmable I/O buffers
Programmable so they could adjust to different I/O standards.
 D-FFs
Used as optional delay elements or registers.
 pull-up/down resistors
Used to assert or de-assert pins that would otherwise float.
 delay array
Provides a programmable delay of I/O signals.
 keeper circuit
Keeps the last state on a bus if all other drivers are in High-Z
state.
Ognjen Šćekić
22/103
Software Support
• Development of an FPGA-based hardware system
can be divided into following stages:
• system design & synthesis
• design implementation
• on-chip verification
Figure 4a - Altera design flow diagram
Ognjen Šćekić
Figure 4b - Xilinx design flow diagram
23/103
System Design Stage
• Begins with the design entry phase using:
• HDL – Hardware Description Language (like VHDL or Verilog)
• schematic editor
• Software solutions offer complete integrated environments for this stage.
• A wide variety of FPGA-ready component libraries are available
ranging from simple processors, peripheral components, controllers,
down to general logic (gates, counters, decoders, etc).
• Software support hierarchical design entry.
Ognjen Šćekić
24/103
System Design Stage (2)
• Once the hardware design is complete it is synthesized:
A process that transforms it from HDL form into a low-level gate form,
called RTL – Register Transfer Level description.
• The system design stage is platform independent.
The resulting RTL description of our system can be fitted into any FPGA.
Figure 5 - HDL and schematic representation of a BCD counter
Ognjen Šćekić
25/103
Design Implementation Stage
• Commonly called Place-And-Route stage.
• Place-And-Route tools take the input RTL netlist for the design
and map the logic into the architectural resources of the FPGA.
• Then, the best location for these blocks is found,
based on their interconnections and desired performance.
• Finally, the interconnects are routed, and pins assigned.
Ognjen Šćekić
26/103
Design Implementation Stage (2)
• This stage is platform-dependent,
since our design is implemented in an actual FPGA architecture.
• Therefore, place-and-route tools are developed by the FPGA vendors.
• They are developed to take full advantage of FPGA architecture,
and to provide optimum performance for a given design.
• Many analysis and simulation tools are provided for this stage.
The result of this stage is a configuration file
which is loaded into FPGA at startup
Ognjen Šćekić
27/103
On-Chip Verification Stage
• This stage is executed once the design has been loaded into the FPGA.
• It gives the developer the possibility for real-world debugging.
• Special cables are supplied with FPGA development kits,
for connecting FPGAs to a PC or a workstation.
• This provides means for reading contents of internal registers
and memory.
Ognjen Šćekić
28/103
Software Support (2)
• Both Xilinx and Altera offer complete software development kits
that guide users through all 3 stages of system design.
• Altera offers Quartus II
• Xilinx offers ISE
• Third-party software tools can be used in system design stage as well.
Ognjen Šćekić
29/103
"Intellectual Property" Blocks
•
Complete designs of some complex systems,
written in HDL by FPGA manufacturers, optimized to run on their FPGAs.
e.g. microcontrollers, microprocessors, etc.
•
CPUs:
Altera: 32-bit Nios II
Xilinx: 32-bit MicroBlaze
Figure 6 - Block diagram of Altera's 16-bit Nios processor
Ognjen Šćekić
30/103
Volume Production Solutions
• When FPGA based designs move in volume production
the main issue is cost reduction!
• Xilinx and Altera have different approaches:
Xilinx offers
Altera
offers specialized
a service called
EasyPath
HardCopy
FPGAs:
:
It is a migration path from the FPGA to structured ASIC.
Once the
clients have
developedcell
their
system (HCells)
on FPGA,
Altera
developed
a fine-grained
structure
ASICs
they send
it to Xilinx.
which
perfectly
match the logic elements (LEs) of Altera’s FPGAs.
That
Stratix
LEsget
areback
mapped
to equivalent
logic elements
After way
8 weeks
they
the optimized
FPGAs
in
theexactly
corresponding
device.
with
the sameHardCopy
functionality.
If a Stratix LE is not used in the FPGA design,
These
FPGAs
areHardCopy
30%-80%
less expensive when mass produced,
then it optimized
is not mapped
to the
device,
and
theyarepresent
replacements
ASICs,
yielding
more efficient
mapping offor
thestructured
prototyped
design.
and take less time to be completed.
Ognjen Šćekić
31/103
Overviews &
Comparisons
Ognjen Šćekić
32/103
low-end FPGA family
Ognjen Šćekić
33/103
Overview
•
Most recent Altera's low-end FPGA family
•
Introduced in 2004, first shipped in February 2005
•
1.2V core, 90nm process
Ognjen Šćekić
34/103
Packaging
• Commercial grade and industrial grade devices are offered.
Ognjen Šćekić
35/103
Functional Description
•
Two-dimensional row/column-based architecture to implement custom logic.
•
Column and row interconnects of varying speeds provide signal interconnects
between Logic Array Blocks (LABs), embedded memory, and multipliers.
•
Logic array consists of LABs, with 16 logic elements (LEs) in each LAB.
Ognjen Šćekić
36/103
Functional Description (2)
•
Density from 4,608 to 68,416 LEs.
•
Up to four phase-locked-loops (PLLs).
•
Global clock network consists of up to 16 global clock lines
that drive throughout the entire device.
Ognjen Šćekić
37/103
Functional Description (3)
•
M4K memory blocks are true dual-port memory blocks with 4K bits of memory.
•
Works at up to 260 MHz.
•
These blocks are arranged in columns across the device
in between certain LABs.
•
Cyclone II devices offer between 119 to 1,152 Kbits of embedded memory.
Ognjen Šćekić
38/103
Functional Description (4)
•
Each embedded multiplier block can implement either two 9×9-bit multipliers,
or one 18 × 18-bit multiplier.
•
Embedded multipliers are arranged in columns across the device.
•
Up to 250-MHz performance.
Ognjen Šćekić
39/103
Functional Description (5)
•
Each I/O pin is fed by an IOE (Input Output Element)
located at the periphery of the device.
•
I/O pins support various single-ended and differential I/O standards.
•
Each IOE contains a bidirectional I/O buffer and three registers
for registering input, output, and output-enable signals.
Ognjen Šćekić
40/103
LE Unit
4-input LUT acts as
a function generator
for logic functions
with 4 variables,
Carry logic
or a 16-bit register.
Programmable register.
Can be configured like
D, T, JK or SR flipflop.
Used optionally.
Cyclone II LE can operate in 2 modes:
• normal mode
• arithmetic mode
Ognjen Šćekić
41/103
LE – Normal Mode
• Suitable for general logic applications and combinatorial functions.
Ognjen Šćekić
42/103
LE – Arithmetic Mode
• Implements a 2-bit full adder and basic carry chain
Ognjen Šćekić
43/103
LABs and Interconnects
•
LAB - Logic Array Block
Local Interconnect.
Transfers signals
between LEs
in the same LAB
ColumnLogic
Interconnect.
Array Block
Connects
multiple
consists
of LABs
16 LEs
connected with carry
and register chains
Row Interconnect.
Connects multiple LABs
Ognjen Šćekić
44/103
Clock Management
• Clock network features:
 Up to 16 Global Clock Networks
 Up to 4 PLLs
 Dynamic clock source selection, enable and disable
• Global clock networks spread throughout the entire device.
• They provide clocks for all resources within the device,
such as IOEs, LEs, memory blocks, and embedded multipliers.
• They are driven by external clock sources (via clock pins),
PLL outputs or the logic array signals.
• Global clock lines can also be used for general purpose control signals.
Ognjen Šćekić
45/103
Clock Management (2)
•
There is one clock control block for each global clock network.
•
They are arranged on the device periphery.
•
Clock control blocks are used to select/enable/disable a global clock network.
•
Multiplexers are used with these clocks to form 6-bit buses to feed LABs and IOEs.
Ognjen Šćekić
46/103
Clock Management (3)
• PLLs are located at the corners:
Ognjen Šćekić
47/103
Clock Management (4)
• Cyclone II PLLs provide:
 Clock skew elimination
Provides zero-delay clock signal in every part of FPGA.
 Clock multiplication and division
Ranges from x(1/128) up to x32.
 Phase shifting
Programmable phase shifts in increments of at least 45°.
 Programmable duty-cycle
Generate clock outputs with a variable duty cycle
 Manual clock switchover
Enables you to switch between two reference input clocks
for applications that may require support
for clocks with two different frequencies.
Ognjen Šćekić
48/103
Embedded Memory
• Consists of columns of M4K memory blocks:
Ognjen Šćekić
49/103
Embedded Memory (2)
The M4K blocks support the following features:
 4,608 RAM bits (4Kbits + parity bits – one for each byte)
 250-MHz performance
 True dual-port memory
Supports any combination of two-port operations:
2 reads, 2 writes, or 1 read and 1 write at different clock frequencies.
 Simple dual-port memory
Simultaneous reads and writes are supported.
 Single-port memory
Simultaneous reads and writes are not allowed.
 Shift register
Ognjen Šćekić
50/103
Embedded Memory (3)
The M4K blocks support the following features:
 FIFO buffer
 ROM
When configured as RAM or ROM,
you can use an initialization file to preload the memory contents.
 Byte enable
Allows the input data to be masked so the device can write to specific bytes.
The unwritten bytes retain the previous written value.
 Address clock enable
Used to hold the previous address value for as long as the signal is enabled.
This feature is useful in handling cache misses.
 Content Addressable memory (CAM)  Associative memory
Ognjen Šćekić
51/103
Embedded Multipliers
• Located in columns high as one LAB row:
Ognjen Šćekić
52/103
Embedded Multipliers (2)
•
Multiplier blocks are optimized for intensive Digital Signal Processing functions,
such as:
finite impulse response (FIR) filters, Fast Fourier Transform (FFT),
Embedded
multipliers
can
work
in 2 basic
Discrete Cosine
Transform
(DCT)
functions,
etc. operational modes:
• One 18b x 18b multiplier
• Operate •atTwo
up toindependent
250 MHz.
9b x 9b multipliers
Ognjen Šćekić
53/103
Embedded Multipliers (3)
•
The embedded multiplier consists of the following elements:
 Multiplier block
 Input and output registers
 Input and output interfaces
Output Register
(used optionally)
These signals control
operand representation:
signed or unsigned
Input Register
(used optionally)
Ognjen Šćekić
54/103
Input/Output Elements
• IOEs (Input Output Elements) are located in I/O blocks at the periphery:
Ognjen Šćekić
55/103
Input/Output Elements (2)
IOEs support many features, including:
 Differential and single-ended I/O standards
 3-state buffers
 Programmable input and output delays
 Programmable pull-up resistors during device configuration and in User Mode
 Bus-hold circuitry
 Joint Test Action Group (JTAG) boundary-scan test (BST) support
 etc.
Ognjen Šćekić
56/103
Input/Output Elements (3)
Programmable
Pull-Up resistor
Output Enable
Register
(used optionally)
Prevents
damage from
high voltage
Output Register
(used optionally)
I/O pin
Bus-hold (keeper)
circuit
Programmable
delay chain
(for input)
Input Register
(used optionally)
Ognjen Šćekić
57/103
Input/Output Elements (4)
IOEs support most conventional and high-speed I/O protocols:






LVTTL (3.3V, 2.5V, 1.8V)
LVCMOS (3.3V, 2.5V, 1.8V, 1.5V)
SSTL (classes I, II) and differential
HSTL (classes I, II) and differential
PCI and PCI-X
etc.
Ognjen Šćekić
58/103
Input/Output Elements (5)
• I/O pins on Cyclone II devices are grouped together into I/O banks.
• Each bank has a separate power bus.
• To accommodate voltage-referenced I/O standards,
each I/O bank has a VREF bus.
• Multiple voltage-referenced standards can be supported in an I/O bank
as long as they use the same VREF and a compatible VCCIO value.
• For example:
When VCCIO is 3.3V, a bank can support LVTTL, LVCMOS,
and 3.3V PCI for inputs and outputs.
Ognjen Šćekić
59/103
Input/Output Banks
Ognjen Šćekić
60/103
Start-Up Configuration
• Logics, circuitry, and routing switches
are configured with CMOS SRAM elements
that require configuration data to be loaded on each power-up.
• Process of physically loading the SRAM data into the device is called:
configuration.
• During initialization, which occurs immediately after configuration,
the device resets registers, enables I/O pins,
and begins to operate as a logic device.
• Together, configuration and initialization are called: command mode.
• Normal device operation is called: user mode.
Ognjen Šćekić
61/103
Start-Up Configuration (2)
• Configuration data is loaded with one of three configuration schemes:
•
Cyclone II can be configured automatically at system power-up
with data stored in a low-cost configuration device
or provided by a system controller (Active Serial scheme).
•
Cyclone II can also act as controller for other devices in AS configuration scheme.
Ognjen Šćekić
62/103
Start-Up Configuration (3)
• Configuration data is loaded with one of three configuration schemes:
•
Cyclone II devices can also be configured while in user mode,
via a serial data stream, using the Passive serial (PS) configuration mode.
•
The PS mode also enables microprocessors to treat Cyclone II devices as memory
and configure them by writing to a virtual memory location,
simplifying reconfiguration.
Ognjen Šćekić
63/103
low-end FPGA family
Ognjen Šćekić
64/103
Overview
•
Spartan-3 was first announced in April 2003.
•
Its latest version (2005) is called Spartan-3E family.
•
90nm process
Ognjen Šćekić
65/103
Packaging
•
Commercial grade and industrial grade devices are available.
Ognjen Šćekić
66/103
Functional Description
• The Spartan-3 family architecture consists of
five fundamental, programmable functional elements:
• Configurable Logic Blocks (CLBs)
Contain RAM-based Look-Up Tables (LUTs) to implement logic,
and storage elements that can be used as flip-flops or latches.
• Digital Clock Manager (DCM) blocks
Provide fully digital solutions for distributing, delaying, multiplying,
dividing, and phase shifting clock signals.
• Block RAM
Provides data storage in form of 18-Kbit dual-port blocks.
• Multiplier blocks
Accept two 18-bit binary numbers as inputs and calculate the product.
• Input/Output Blocks (IOBs)
Control the flow of data between the I/O pins and the internal logic of the device.
24 I/O standards supported.
Ognjen Šćekić
67/103
Spartan-3 Floorplan
Ognjen Šćekić
68/103
CLB Overview
•
CLBs constitute the main logic resource
for implementing synchronous as well as combinatorial circuits.
•
Each CLB comprises 4 interconnected slices, as shown below.
•
These slices are grouped in pairs.
Each pair is organized as a column with an independent carry chain.
Ognjen Šćekić
69/103
CLB Overview (2)
•
All four slices have the following elements in common:
 2 logic function generators (4-input LUTs)
 2 storage elements
 wide-function multiplexers
 carry logic
 arithmetic gates
•
Both the left-hand and right-hand slice pairs use these elements
to provide logic, arithmetic, and ROM functions.
Ognjen Šćekić
70/103
CLB
ENLARGE
4-input LUT
"G"
Top portion
Blue-dotted
elements are
used for
implementing
16-bit
shift-registers.
Carry chain
between two
logic cells in
a CLB
Found only in
left-hand CLBs
Bottom portion
4-input LUT
"F"
Ognjen Šćekić
71/103
CLB upper portion - ENLARGED
Flow control
multiplexers
OR gate,
used for
logic and
arithmetic
functions
Optionally used
register.
Programmable
as latch or D-FF
AND gate,
used for
logic and
arithmetic
functions
Ognjen Šćekić
72/103
Interconnects
•
Interconnects pass signals among various functional elements of Spartan-3 devices.
•
There are four kinds of interconnects:
• Long lines
Connect every sixth CLB in a row/column.
Because of their low capacitance,
these lines are well-suited for carrying high-frequency signals with minimal skew.
They can also serve as replacements for global clock lines.
• Hex lines
Connect every third CLB in a row/column.
• Double lines
Connect every other CLB in a row/column.
• Direct lines
Afford any CLB direct access to neighboring CLBs.
Ognjen Šćekić
73/103
Interconnects (2)
Ognjen Šćekić
74/103
Clock Management
• Spartan-3 devices have up to 4 DCM (Digital Clock Manager) blocks.
• DCMs supports 3 major functions:



clock-skew elimination
frequency synthesis
phase shifting
• A DCM consists of:




Delay-Locked Loop (DLL)
Digital Frequency Synthesizer
Phase Shifter
Status Logic
Ognjen Šćekić
75/103
Clock Management - DLL
•
•
2 clock inputs (input + feedback), 7 clock outputs
2 operating modes: Low Frequency and High Frequency (3 outputs enabled)
Outputs
Programmable
delay blocks
called taps
Ognjen Šćekić
76/103
Clock Management (3)
•
DFS component generates output clock signals,
the frequency of which is a product of the clock frequency at the CLKIN input
and a ratio of two user-defined integers:
fOUT  f IN 
C MUL
;
C DIV
C MUL  [2,32] , C DIV  [1,32]
•
This gives the following output range: from x(1/16) up to x32
•
Besides 90°, 180° and 270° phase-shifted signals from DLL,
the PS component provides a still finer degree of control,
with resolution up to 1/265 of input clock cycle. (Low Frequency mode only)
•
Spartan-3 devices have 8 global clock inputs.
These inputs provide access to a low-capacitance, low-skew network
that is well-suited to carrying high-frequency signals.
Ognjen Šćekić
77/103
Clock Management (4)
Global clock
inputs
Clock multiplexers
route global clock
lines to local clock
networks and to
Digital Clock
Managers
Figure 7 - Spartan-3 Global Clock Networks (left).
Duty cycle correction (right)
Ognjen Šćekić
78/103
Embedded Memory (Block RAM)
• Organized as configurable, synchronous blocks, in up to 4 columns.
• 200 MHz performance
• Each block contains 18K bits of fast static RAM,
16K bits for data storage + 2K bits for parity bits.
Ognjen Šćekić
79/103
Embedded Memory (2)
• Physically, the block RAM memory has two independent access ports,
labeled Port A and Port B (dual port memory).
• The structure is fully symmetrical. Both ports are interchangeable
and both ports support data read and write operations.
Each port has its own clock.
Ognjen Šćekić
80/103
Embedded Multipliers
•
4 to 104 dedicated 18x18-bit multipliers.
•
Operands are in two's complement form: 18-bit signed or 17-bit unsigned.
•
One multiplier is matched to each Block RAM to ensure efficiency.
•
Cascading multipliers permits more than 3 operands, and wider than 18b.
•
Multiplication using inputs with more than 18 bits wide is possible
by decomposing the multiplication process into smaller subprocesses.
A
Figure 8 - 22x16-bit multiplier implementation
Ognjen Šćekić
81/103
Input/Output Blocks
•
Input/Output Block (IOB) provides a programmable, bidirectional interface
between an I/O pin and the FPGA’s internal logic.
•
There are three main signal paths within an IOB:
(each has an optional pair of storage elements,
used as latches or D-FFs)
 Output path
Carries data from I/O pin to the internal logic.
 Input path
Carries data from the FPGA’s internal logic
through a multiplexer and then a 3-state buffer (driver) to the I/O pin.
 3-state path
Determines when the output buffer (driver) is high impedance.
Ognjen Šćekić
82/103
IOB
3-state Path
Programmable
output buffer
Optional
storage
element
I/O pin
Output Path
Input Path
ENLARGE
Ognjen Šćekić
83/103
Part of IOB - ENLARGED
Programmable
Pull-Up and
Pull-Down
resistors
Digitally
controlled
impedance.
VREF pin
Used to match
the impedance
of transmission
line
Circuitry for
implementing various
I/O standards
I/O pin from
adjacent IOB
used for
differential
I/O
standards
Ognjen Šćekić
84/103
Input/Output Blocks (4)
•
Support for 18 single-ended 6 differential I/O standards.
Differential standards are implemented by using a pair of IOBs.
•
IOBs and pins are grouped into banks.
The need to supply VREF and VCCO imposes constraints
on which standards can be used in the same bank.
•
Supported I/O standards include:






LVTTL (3.3V)
LVCMOS (3.3V, 2.5V, 1.8V, 1.5V)
SSTL (classes I, II) and differential
HSTL (classes I, II, III ) and differential
PCI 3.0V
etc.
Ognjen Šćekić
85/103
Start-Up Configuration
•
Spartan-3 devices are configured
by loading configuration data into internal configuration memory.
•
Several configuration modes are supported, selectable via mode pins M0, M1, M2.
Ognjen Šćekić
86/103
Start-Up Configuration (2)
•
In Slave Serial mode, the FPGA receives configuration data in bit-serial form from
a serial PROM or other serial source of configuration data.
•
The CCLK pin on the FPGA is an input in this mode.
•
Multiple FPGAs can be daisy-chained for configuration from a single source.
After a particular FPGA has been configured,
the data for the next device is routed internally to the DOUT pin
Slave–Serial
configuration
mode
Ognjen Šćekić
87/103
Start-Up Configuration (3)
•
In Master Serial mode, the master FPGA drives the configuration clock
on the CCLK pin to the Xilinx Serial PROM,
which, in response, provides bit-serial data to the FPGA’s DIN input.
•
After the master FPGA has finished configuring,
it passes data on its DOUT pin to the next FPGA device in a daisy-chain.
Master–Serial
configuration
mode
Ognjen Šćekić
88/103
Start-Up Configuration (4)
•
In Slave Parallel mode,
byte-wide data is written into FPGA,
with a BUSY flag controlling the flow.
•
•
An external source provides data,
CCLK, a Chip Select (CS_B) signal
and a Write signal (RDWR_B).
In Master Parallel mode,
FPGA configures from byte-wide data,
and the FPGA itself supplies CCLK
(configuration clock).
•
CCLK behaves as a bidirectional I/O pin.
Ognjen Šćekić
89/103
high-end FPGA family
Ognjen Šćekić
90/103
Quick Overview
•
Launched in February 2004.
•
1.2V core, 90nm process
•
Approaching 180,000 LEs
•
Up to 9 Mbits of on-chip, TriMatrix memory for memory-demanding applications.
•
Up to 96 DSP blocks with up to 384 (18-bit × 18-bit) multipliers
for efficient implementation of high performance filters and other DSP functions.
•
Various high-speed external memory interfaces are supported.
•
Complete clock management solution with clock frequency of up to 550 MHz
and up to 12 phase-locked loops (PLLs).
Ognjen Šćekić
91/103
Quick Overview (2)
•
Designers requiring a low-risk cost-reduction path for high-volume production
can easily migrate their Stratix II FPGA designs to structured-ASIC production
with HardCopy II devices.
•
HardCopy II devices significantly minimize migration risk
because they are generated directly from a Stratix II FPGA
and preserve the Stratix II architecture.
Ognjen Šćekić
92/103
Quick Overview (3)
•
ALM – Adaptive Logic Module
•
One of the greatest improvements is certainly represented by the ALM architecture,
allowing it to be configured in various modes.
Ognjen Šćekić
93/103
high-end FPGA family
Ognjen Šćekić
94/103
Quick Overview
•
Introduced in 2004
•
1.2V core, 90nm process
•
Three high-performance versions LX/SX/FX
- Virtex-4 LX: Logic applications solution.
- Virtex-4 FX: Full-featured solution for embedded platform applications
- Virtex-4 SX: Solution for Digital Signal Processing (DSP) applications
•
Up to 200,000 logic cells
•
Xesium Clock Technology
- Up to 20 Digital Clock Manager (DCM) blocks
- Additional Phase-Matched Clock Dividers (PMCD)
- 32 Global Clock networks
•
Up to 10Mb of integrated block memory operating at 500MHz
Ognjen Šćekić
95/103
Quick Overview (2)
•
XtremeDSP Slice
- 18x18 signed multipliers
- Up to 100% speed improvement over previous generation devices
•
Up to 960 user I/Os
•
IBM PowerPC RISC Processor Core (FX only)
Ognjen Šćekić
96/103
Quick Overview (3)
• At the heart of the Virtex-4 family is the new ASMBL architecture.
ASMBL – Advanced Silicon Modular Block
• This new, highly modular ASMBL architecture
makes use of advanced packaging technology
and eliminates geometric layout constraints
associated with traditional chip design.
• Thanks to it, Xilinx can vary the number and ratio of different functional parts
to create a family (platform) of different sized devices,
each best suited for a certain domain of applications,
depending on the desired type of functional attributes.
• This approach enables the right feature mix at the lowest cost,
and resulted in 3 platforms of Virtex-4 FPGAs – LX, FX, SX.
Ognjen Šćekić
97/103
Altera
vs.
Ognjen Šćekić
Xilinx
98/103
Altera
vs.
Xilinx
• Deciding which of the two is currently better,
on basis of described features, is an impossible task:
 Both of them offer a vast range of FPGAs, at different prices,
guaranteed to satisfy any user’s needs.
 If we make feature-to-feature comparison of same-rank FPGAs
we will find that they offer very similar features at very similar prices:
 90nm process, 1.2V core
 up to 200,000 LC (LEs)
 maximum internal frequency around 500 MHz
 embedded 18x18 multipliers and enhanced DSP features
 up to 10Mbits of multi-purpose embedded RAM
 support for leading I/O standards and external memory interfaces
 numerous IP blocks (Nios II, MicroBlaze, etc.)
 complete software systems (ISE and Quartus II)
Ognjen Šćekić
99/103
Altera
vs.
Xilinx (2)
 Benchmarking also yields controversial results.
All the benchmarks are performed either by Xilinx/Altera, or their partners.
Both companies issue whitepapers
claiming their FPGAs considerably outperform the opponent’s ones:
Quote:
“… Our benchmark results show that for high-density 90-nm FPGAs,
the Altera Stratix II family commands an average of 39% performance lead
over Xilinx Virtex-4 family.
For low-cost FPGAs, the Altera 90-nm Cyclone II family provides an average
60% higher performance than the Xilinx 90-nm Spartan-3 family…”
Altera whitepaper, “FPGA Performance Benchmarking Methodology”
Quote:
“… Cyclone II performance, as demonstrated by a suite of customer designs
using the most cost effective speed grade, has degraded almost a full speed grade
from Quartus II v4.1 to v4.2, and further degradation is indicated for the new v5.0.
Spartan-3 design performance is now slightly faster than Cyclone II when comparing
the most cost effective speed grade in each device…”
Xilinx whitepaper, “Spartan-3 vs. Cyclone II Performance Analysis”
Ognjen Šćekić
100/103
Altera
vs.
Xilinx (3)
Is thereLet
a way
to find
out who is better?
us ask
the customers:
Quote:
“… in a survey of more than 350 design teams worldwide, in which respondents were
asked to rate their experience with FPGA and EDA companies' products and services,
FPGA designers ranked Xilinx highest in reader/customer satisfaction for devices,
design tools, service and support, including:
Virtex and Spartan FPGAs - "Xilinx continues to lead the pack in performance and
features, and goes the extra mile in explaining how to use
their devices for particular class of application."
ISE design tools
Support staff,
and documentation
- "Xilinx has made significant improvements to their tool
suite over the past year, particularly in the DSP and
embedded design areas."
-"Xilinx consistently sets the standard for support staff and
resources, particularly with their robust website and
responsible and knowledgeable application engineers."
FPGA Journal
Ognjen Šćekić
101/103
Conclusion
• It seems that Xilinx is the winner.
• But the competition is closing the gaps.
• A careful reader will notice that
the stated reasons for Xilinx winning the readers’ award
have more to do with client relations
than with a great difference in performance.
• One thing, however, is certain:
= A satisfied user 
vs.
Ognjen Šćekić
102/103
Thank you!
The End
Ognjen Šćekić
103/103
Download