Enabling technologies for a new TELL DAQ interface module Guido Haefeli EPFL, Lausanne

advertisement
1st LHCb Collaboration Upgrade Workshop, Edinburgh 11.-12. January 2007
Enabling technologies for a new
TELL DAQ interface module
Guido Haefeli
EPFL, Lausanne
Outline
•
•
•
•
•
The current TELL1
10 Gigabit Ethernet for FPGAs
FPGA technology and its trends
TELL10 or TELL5 outline
Conclusions
1) Digitization (VELO)
2) Synchronization (TTC)
3) Data compression (factor 10)
4) Buffering
5) Ethernet and IP formatting (framer), physical IF
24 x 1.28 Gbit/s
=
30 Gbit/s
co
Da
mp t a
re
~ 1 ssi o
0
n
4 x 1 Gbit/s
=
4 Gbit/s
Interface to the input data “O-Rx”
8 Gbit/s
12 links
12 links
1.6 GHz
DES
• The TELL1 input data is received on 24
optical links running at 1.28 Gbit/s
data rate.
8 Gbit/s • Two 12-way optical receivers are
employed.
• De-serialisation (DES) is implemented
on the “SERDES” TLK2501.
8 Gbit/s • Data transmission to the processor
FPGAs on parallel data buses running
at 80MHz.
• Each PP-FPGA receives 8 Gbit/s of
8 Gbit/s
input data.
80 Mhz
Processor FPGA “PP-FPGA”
•
Input data receiver FPGA,
performes:
–
–
•
Summary:
•Total of 100K Logic Elements
•Total of 8 Mbit RAM
•120 MHz processing frequency
Synchronization
zero suppression (common mode
suppression, clusterization, hit address
generation, …)
4 identical FPGAs per board of the
type first generation Altera Stratix,
25K Logic Elements, 2 Mbit RAM,
780-pin.
– Memory and logic resources 80%
used.
– Processing clock frequency 120 MHz,
slowest speed grade to minimize
FPGA cost.
– IO usage 80%, 20% used by the Level
1 buffer interface
– Embedded DSP blocks (Multiply
accumulators) only 30% used.
Processing is very logic and memory
demanding !
Synchronization and Link FPGA
“SyncLink-FPGA”
s
i t/
Gb
4G
– Data linking and provides
interface to processor FPGAs @
4Gbit/s bandwidth.
– Data buffering with QDR SRAM,
4Gbit/s read and write.
– Provides interconnection to the
Gigabit Ethernet card at 4Gbit/s.
– TTCrx interface.
4
bit
/s
• SyncLink FPGA performs:
4 Gbit/s
Summary:
•Total of 25K Logic Elements
•Total of 2 Mbit RAM
•120 MHz processing frequency
• One FPGA first generation
Altera Stratix (25K Logic
Elements), 1020-pin.
– 50% logic and memory resources
used
– No DSPs used
– IO 70% used.
Network interface “GBE”
• 4-way Gigabit Ethernet
implemented on a
mezzanine card.
• 4 x 1 Gbit/s over copper
• MAC and PHY on
mezzanine card
• Parallel “FIFO like”
interface between
FPGA and MAC.
ECS and TTC interface
• CCPC and GlueCard on
board, 10/100 Ethernet
connection for ECS.
• I2C, JTAG, Local processor
bus available via GlueCard
• TTCrx on board connected
to the SyncLink-FPGA
10 Gigabit Ethernet, Infiniband, 100
Gbit Ethernet
• Optical 10 Gigabit Ethernet standard since 2002. Equipment
becomes “in fashion” and cheaper. (5KCHF per port)
• Copper 10 Gigabit Ethernet is standardized now (2006)
10GBASE-T, or IEEE 802.3an-2006, but no final cable
specification yet. Maybe this goes faster than optical.
• Infiniband is maybe an alternative to Ethernet, chips with
one and soon for two port 10 Gbit/s over copper are
available (PCI-e Î Infiniband, Mellanox ConnextX), requires
new cabling.
• 100 Gigabit Ethernet is the next generation Ethernet
(decided by IEEE 2006), not yet commercialised but
standardisation is ongoing, demands are existing. First
equipment expected by 2009-2010.
10 Gigabit Ethernet for FPGAs
MAC
XAUI interface
FPGA with serial
transceivers
XENPAK (10 Gigabit Ethernet
optical interface module
10 Gigabit Ethernet PHY and
optical transceiver “XENPAK”
standard module
XENPAK, XPAK, XFP module
• Standard modules for
different link length
– Optical 10GBASE SR,
300m available (still
expensive 5KCHF)
• Used in high end server
NIC cards or uplinks for
Gigabit Ethernet switches
Copper 10 Gigabit Ethernet
• Copper 10 Gigabit
Ethernet exists
(6Watt/Xenpak) !
• Copper will maybe
become cheap !
Altera Stratix history
•
First generation Stratix I:
– Introduced 2002 (used for TELL1)
– 80K Logic Elements (LE)
– 8Mbit memory
– 20 serial high speed transceivers (3.125 GHz)
– System speed (120 MHz for TELL1) (slowest speed grade)
•
Second generation Stratix II (90nm):
– Introduced 2004
– 180K Logic Elements (LE)
– 9Mbit memory
– 20 serial high speed transceivers (6.375 GHz)
– System speed 50% faster than Stratix I
•
Third generation Stratix III (65nm):
– Introduced 2006 (mass production 2008), price for 140K LEs
700 CHF at 1000 piece
– 340K Logic Elements (LE)
– 21Mbit memory
– 20? (not available yet) serial high speed transceivers (6.375
GHz ?)
– System speed 25% faster than Stratix II
Xilinx Virtex history
•
•
•
Virtex 2 Pro (130nm):
– Introduced 2002
– 100K Logic Cells (LCs)
– 8Mbit memory
– 2 PowerPCs
– 20 serial high speed transceivers (3.125 GHz)
Virtex 4 (90nm):
– Introduced 2004
– 200K Logic Cells (LCs)
– 10Mbit memory
– 2 PowerPCs
– 24 serial high speed transceivers (6.5 GHz)
– System speed (500 MHz maximal internal clocking)
Virtex 5 (65nm):
– Introduced 2006 (mass production 2008),price for 330KLCs is 3100 at
1000 piece
– 330K Logic Cells (LCs)
– 15Mbit memory
– 24 serial high speed transceivers (3.125 GHz)
– System speed (550 MHz maximal internal clocking)
FPGA trends
• FPGA is replacing ASICs and DSPs, its market is growing
fast !
• Every 2 years a new generation of FPGA, 20-30% faster and
50% more logic.
• Hardcore functionality and chip types with, DSP blocks, high
speed transceivers, PCI-e, Gigabit Ethernet MAC, micro
processors.
• Serial interfaces is in fashion, PCI-e, XAUI, …
• More and more modular Intellectual Property (IP) cores
available.
• Low power consumption is important for the industry, but
faster and bigger devices consume more power.
• But constant number of high speed transceivers.
• But constant number of maximal IO pins.
• But cost per high end chip is constant !
New TELL1 task and performance
• Same tasks as current TELL1
– Synchronization to TTC
– Data pre-processing to achieve data compression by a factor
10!
– Data buffering for zero suppression and DAQ interface.
• No trigger functionality, no board to board interconnect.
Definition:
TELL5
5 times current TELL1 data bandwidth
TELL10 10 times current TELL1 data bandwidth
TELL40 40 times current TELL1 data bandwidth
Compare high end FPGA to TELL1
processor
• TELL1 PP-FPGA
– 25K Logic Elements (LE)
– 2Mbit memory
– System speed 120 MHz
• TELL10 PP-FPGA
– 340K Logic Elements (LE)
– 21Mbit memory
– System speed 320 MHz (max on
chip clock speed 600 MHz)
Processing
Power
x 36
Receiver
Use high speed optical links:
– 2.56 Gbit/s data rate, 3.2 GHz links
– De-serializer 4 x 2.56 Gbit/s to 32-bit@320MHz
Eg. PMC Sierra PM8358-Quad PHY Serdes
Broadcom BCM8011
•
Input data bandwidth:
– TELL1: 24 x 1.28 Gbit/s = 30 Gbit/s
– TELL10: Factor 328 Gbit/s Î
• Use 32 de-serializer
• 1024-bit@320MHz parallel input distributed to 4 FPGAs
– TELL40: Factor 1311 Gbit/s Î
• Use 128 de-serializer
• 4096-bit@320MHz parallel input distributed to 4 FPGAs
Receiver (2)
32
8G
4 x Rcv
/s
bit
Rx
4 x Rcv
TELL40
/s
4 x Rcv
it
Gb
10
TELL10
4 x Rcv
4 x Rcv
4 x Rcv
4 x Rcv
Text
4 x Rcv
Possible now
or soon with 10 Gbit/s
Higher integration on chips
needed, future (100 Gbit/s) ?
Network interface
Assumption:
– Use 10 Gigabit Ethernet
– Design for XENPAC PHY and optical or copper interconnect
• Required bandwidth:
– TELL1: 4 Gbit/s Î
– TELL10: 40 Gbit/s Î
• Possible: With XAUI interface and serial links (4x4@3.2GHz),
XGMII 4 x 32-bit@320MHz,
• 4 x XENPAC PHY modules
• 4 x 10-Gbit/s MAC on FPGA
– TELL40: 160 Gbit/s Î
• Needs 16 x 10 Gigabit Ethernet connections, space, too high
power and IO not available yet.
Memory trends
• SRAM and SDRAM memory with sufficient
bandwidth and size are available already
now.
– QDRII+ SRAM, 72Mbit, 32-bit@800Mbit/s
– SODIM DDRII, 4GByte,64-bit@800Mbit/s
If more bandwidth is required multiple
channels can be implemented.
TELL10 outline with current technology
8x 4-channel SERDES
256-bit parallel
interconnection
4 x 64-bit parallel
interconnection
Gigabit MAC as IP core
on FPGA
4 x 10 Gigabit
Ethernet uses a total
of 16 high speed
transceivers
4 x XENPAK optical 10
Gigabit transceivers
TELL10 cost estimation today
32 x 150 CHF
4 x 3KCHF
1 x 3KCHF
4 x 5KCHF (XENPAK
optical 10 Gigabit
transceivers)
5KCHF PCB, ECS,
connectors, fabrication
Total: 45KCHF
TELL5 “Single chip board”
Number of input links
adapted to possible
processing power of
FPGA
Only one large FPGA,
sufficient transceivers
and IO pins
(size adapted to needs)
1-4 XENPAK optical 10
Gigabit transceivers)
Less dense PCB, can ease
fabrication and changes
Less complex design
motivates to follow new
technology (FPGAs)
Summary
•
•
10 Gigabit Ethernet (also Infiniband) is standardized and commercialized
already today. Standardized PHY modules leaves choice for optical or
copper interface.
TELL5 (single FPGA) is a very attractive architecture with many
advantages:
–
–
–
Reduces complexity of the board which leads to smaller development cost and
smaller design phase.
Can gain a maximum from new generation FPGAs (less complex design motivates to
follow technology)
Single FPGA board can reach processing power and DAQ bandwidth of at least 5
times the current TELL1.
•
TELL10 is possible yet or soon, cost, power consumption, complexity is an
issue!
•
TELL40 needs 100 Gigabit/s equipment and is not in range yet.
•
What about firmware development for integration of System on
Programmable chip, microprocessor on FPGA. This has to be addressed for a
new development.
Cisco XENPAK modules
•
•
•
•
•
•
•
•
•
•
•
•
CISCO XENPAK-10GB-CX4 (used for Infiniband)
The Cisco 10GBASE-CX4 Module supports link lengths of up to 15 meters on CX4 cable.
CISCO XENPAK-10GB-LX4
The Cisco 10GBASE-LX4 Module supports link lengths of 300 meters on standard
Fiber Distributed Data Interface (FDDI) grade multimode fiber (MMF). To ensure
that the specifications reported in Table 1 are met, the transmitter output should be
coupled through a mode conditioning patch cord.
CISCO XENPAK-10GB-SR
The Cisco 10GBASE-SR Module supports a link length of 26 meters on standard FDDI
grade MMF. Up to 300-meter link lengths are possible when using 2000 MHz/km MMF
(OM3).
CISCO XENPAK-10GB-LR
The Cisco 10GBASE-LR Module supports a link length of 10 kilometers on standard
single-mode fiber (SMF) (G.652).
CISCO XENPAK-10GB-ER
The Cisco 10GBASE-ER Module supports a link length of up to 40 kilometers on SMF
(G.652).
CISCO XENPAK-10GB-ZR
The Cisco 10GBASE-ZR Module supports link lengths of up to about 80 kilometers on
SMF. This interface is not part of the 10 GbE standard but is built according to Cisco
optical specifications reported in Table 3.
Different transceiver module specs
•
The 300-pin MSA converts between a 10Gbit/s serial optical signal and 16 parallel 622Mbit/s
electrical signals, and currently accounts for most shipments.
• The XENPAK MSA was co-founded in March 2001 by Agere Systems and has more than 25
member companies. It provides a smaller form factor, since it uses four channels running at
3.125Gigabit/s on the electrical side. Users have limited development efforts due to the market
slump, but demand for smaller modules is now picking up. But, although XENPAK is designed for the
heat dissipation necessary with high-power, long-reach telecom lasers, it also provides the standard
for shorter-reach 10 Gbit Ethernet (10GbE) transceivers, for which it is larger than desired. MSAs
have been proposed that are compatible with XENPAK’s four-wire 10 Gbit attachment unit interface
(XAUI) and 70-pin electrical connector, but with smaller form factors for space-constrained
applications to move to 10Gbit/s. The XPAK MSA group (www.xpak.org) was formed in March 2002
by Intel, Infineon Technologies and Picolight. In August XPAK made available Revision 2.0 ( “build-to”
specification). In September membership grew to 21.
• The X2 MSA (www.x2msa.org) was formed July 2002 by component suppliers Agere Systems and
Agilent Technologies and subsequently supported by vendors JDS Uniphase, Mitsubishi Electric,
NEC, OpNext, Optillion and Tyco Electronics. Shipping from first-half 2003, XPAK and X2
transceivers are initially focused on shorter-reach 10km links (comprising 80% of 10 GbE port
applications) and second-generation applications that do not need XENPAK’s thermal capacity (though
the heat sink can be adapted to different 10Gbit/s applications). So, although OEMs wanting to
launch products immediately are going with XENPAK, it is expected that XPAK and X2 will grow
faster than XENPAK implementations.
• The XFP (10 Gigabit Small Form-Factor Pluggable) serial interface module group (www.xfpmsa.org),
was founded in March 2002 by 10 networking, system, optical module, semiconductor and connector
companies, including Broadcom, Brocade, Emulex, Finisar, JDSU, Velio, Maxim Integrated Products,
ONI Systems, Sumitomo Electric, ICS, and Tyco Electronics. Unlike XENPAK, X2 and XPAK, XFP
has a 10Gbit/s serial electrical interface (XFI) that converts serial 9.95-10.7Gbit/s electrical signals
into external serial 9.95-10.7Gbit/s optical or electrical signals. This eliminates mux/demux serialto-parallel conversion logic chips inside the module and allows the serial 10Gbit/s physical-layer IC
(PHY) to be moved on to the PCB (away from optics generated heat) and everything up to the XFI
serial interface to be integrated into the CMOS media-access controller chip.
Download