10 Gigabit technologies for a 40 MHz readout “Faster, Fifi, faster” 1

advertisement
10 Gigabit technologies for a 40 MHz
readout
1st
“Faster, Fifi, faster”
LHCb Collaboration Upgrade Workshop
January 2007, Edinburgh
Niko Neufeld, CERN/PH
Thanks to Artur Barczyk, Beat Jost, Radu Stoica and Sai Suman for
many interesting discussions
Faster, Fifi, Faster © Gary Larson
Niko Neufeld
CERN, PH
2
LHCb Trigger-DAQ system: Today
•
•
•
LHC crossing-rate: 40MHz
Visible events: 10MHz
Two stage trigger system
– Level-0: synchronous in
hardware; 40 MHz Æ 1 MHz
– High Level Trigger (HLT):
software on CPU-farm;
1 MHz Æ 2 kHz
•
•
Front-end Electronics (FE):
interface to Readout Network
Readout network
FE
FE
FE
L0
trigger
– Gigabit Ethernet LAN
– Full readout at 1MHz
•
Readout network
Event filter farm
Timing and
Fast Control
– ~ 1800 to 2200 1 U servers
CPU
permanent
Niko Neufeld
CERN,storage
PH
CPU
CPU
CPU
CPU
3
Terminology
•
•
•
channel: elementary
sensitive element = 1 ADC =
8 to 10 bits. The entire
detector comprises millions
of channels
event: all data fragments
(comprising several
channels) created at the
same discrete time together
form an event. It is an
electronic snap-shot of the
detector response to the
original physics reaction
zero-suppression: send
only channel-numbers of
non-zero value channels
(applying a suitable
threshold)
•
Niko Neufeld
CERN, PH
packing-factor: number of
event-fragments (“triggers”)
packed into a single
packet/message
– reduces the message rate
– optimises bandwidth usage
– is limited by the number of
CPU cores in the receiving
CPU (to guarantee prompt
processing and thus limit
latency)
4
LHCb DAQ system: features
•
•
Average event (= trigger) rate: 1 MHz
Data from several triggers (L0 yes) are concatenated into
1 IP packet
¾ reduces message / packet-rate
•
IP packets are pushed over 1000 BaseT links
¾ large buffering throughout the network required
¾ no traffic shaping
¾ extremely simple protocol
•
•
•
Destination IP-address is synchronously and centrally
assigned via a custom optical network (TTC) to all
(TEL|UK)L1s
Large Ethernet/IP network (~ 3000 Gigabit ports) connects
PC-server farm and (TEL|UK)L1s
Load balancing via destination assignment
Niko Neufeld
CERN, PH
5
LHCb DAQ features 2)
•
Uses only industry standards: Ethernet, IA32 PCs
¾ there is no scenario for the next 10 years of computing, which
does not include these two
•
Uses commercial components throughout
¾ no in house electronics to design and maintain
¾ can easily take advantage of newer = better = cheaper
hardware
•
•
•
Very simple interface to (TEL|UK)L1s
Scalable
“Compact” – cable distances of no more than 36 m
allow using cheap UTP cabling
How many of these can be promoted to 10 Gigabit?
Niko Neufeld
CERN, PH
6
10 Gigabit for DAQ: where?
1. At the source: Data-processing boards
(TELL10s) send data on one or several 10
Gigabit links
2. On the way: Switches / routers distribute
data on 10 Gigabit links
3. At the destination: Servers receive data at
up to 10 Gigabit speed
A DAQ system in the 500 GB/s range will have 10
Gigabit at least in 1. and 2.
Niko Neufeld
CERN, PH
7
10 Gigabit Technologies
The champion & the contenders
•
•
•
Ethernet:
– Well established (various optical standards, short range copper
(CX4), long range copper over UTP CAT6A standardised), widely
used as aggregation technology
– begins to conquer MAN and WAN market (succeeding SONET)
– Large market share, vendor independent IEEE standard (802.3x)
– Very active R&D on 100 Gigabit and 40 Gigabit (will probably die)
Myrinet:
– Popular cluster-interconnect technology, low latency
– 10 Gig standard (optical and copper (CX4) exist)
– Single vendor (Myricom)
InfiniBand:
– Cluster interconnect technology, low latency
– 10 Gig and 20 Gig standards (optical and copper)
– Open industry standard, several vendors (OEMs) but very few chipmakers (Mellanox)
– Powerful protocol/software stack (reliable/unreliable datagrams,
QoS, out-of-band messages etc…)
Niko Neufeld
CERN, PH
8
The champion: Ethernet
•
•
•
Ad 1.) the TELL10 card: Ethernet still allows simple FIFO like
interface (more details on TELL10 & 10 Gbps Ethernet in Guido’s
talk)
¾ however due to the (ridiculously) small frame size use of a higher
level protocol is mandatory
¾ Pure Ethernet remains quite primitive and does not provide for any
reliable messages. The natural reliable protocol TCP/IP is very (too?)
heavy for implementation in FPGAs
Ad 2.) prices per router port are dropping quickly, still more
expensive than InfiniBand, copper standard exist, but not over
existing cabling (Cat 6) and high power consumption Æ optical
still *very* expensive (quantity!)
Ad 3.) Various NIC cards exist
¾ Emphasis on TCP/IP to offload host Æ our primitive protocol can not
profit!
¾ Not yet on the mainboard, but only a question of time – at least for
high end servers
Niko Neufeld
CERN, PH
9
Ad 1.) A contender: InfiniBand
on the source (TELL10)
•
•
•
•
•
•
Nallatech plug-in card
On-board Xilinx Virtex-II Pro FPGA
Up to 20k logic cells of
programmable logic per module
Up to 88 Block RAMs and 88
embedded multipliers per module
2x InfiniBand ™ I/O links
2x RocketIO serial links (like on
the UKL1)
Niko Neufeld
CERN, PH
10
Ad 2.) InfiniBand Switches
High-Density Switch
(432 ports)
• 10 Gbps standard
• 20 Gbps with DDR also
•
available
30 Gbps coming up
Edge Switch
(24 ports)
Optical Transceiver
Module
Niko Neufeld
CERN, PH
11
Ad 3.) InfiniBand for the servers
•
•
Quite a few InfiniBand adapter cards (“HCA”) exist.
No mainboards exist yet with onboard InfiniBand
adapter
– availability of onboard Gigabit Ethernet NICs makes (copper
UTP) Gigabit Ethernet essentially zero-cost on the servers
– physical signalling is compatible (Myricom makes dualpersonality cards!), there are rumours that Intel will bring out
a chipset with both options
•
•
InfiniBand on the server might have performance
advantages (later)
Dual-personality switches exist: they act as an
InfiniBand to Ethernet / IP bridge
Niko Neufeld
CERN, PH
12
(Potential) advantages of
InfiniBand
(applies partially also to Myrinet and other cluster interconnects)
•
Low latency & reliable datagrams
•
Cost per switch-port much lower than in Ethernet
(requires much less buffer per port / very high-speed
buffer memory is very expensive)
Even- building using remote DMA could result in much
lower CPU “wasted” for data movement
•
¾ implement pull protocol Æ could result in much more efficient
usage of network bandwidth (currently we can use only ~ 20%
of the theoretically available bandwidth)
¾ implement load balancing and destination assignment Æ no
need for custom (“TTC”-like) network for this purpose
¾ Currently we cannot handle more than ~ 300 MB/s per server
(for illustration this means that for a 10 MHz readout at current
event size 35 kB we would need 1000 servers just for the data
formatting, checking and moving
Niko Neufeld
CERN, PH
13
Open questions: InfiniBand
• Technological:
– Can an FPGA drive the InfiniBand adapter or do
we need an embedded host-processor with an
OS?
– Almost the entire traffic is unidirectional (from the
TELL10s to the servers). Can we take advantage
of this fact?
• Market:
– Will InfiniBand be ever standard on PC
mainboards?
Niko Neufeld
CERN, PH
14
System design using 10 Gbps
•
Using 10 Gbps technology several upgrade
scenarios can be studied:
1. A full 40 MHz readout (scaling up the current system by a
factor 40 requires zero suppression at 40 MHz)
2. A two stage system with part of the detector (VeLo++) read
out at 40 MHz and the complete detector at 1 MHz
•
•
A sketch for 2.) is given in the next slide. Going to
1.) means scaling up and simplifying the dataflow
(no back-traffic)
Both could be built today if not (yet) afforded
Niko Neufeld
CERN, PH
15
Fabric
Fabric
High Density Switches
1
2
3
4
400 ports
“in” per Switch
4. Receive Trigger
Decision.
4x10Gbps Links1
2
3
65Gbps per Rack
2. Send to Farm
for Trigger
3. Send trigger
decision
Decision to
TELL10
5. If trigger decision
Positive, readout
35KB @ 1MHz
… 400 TELL10 Boards
1. Readout 10KB events @ 40MHz,
Buffer on TELL10
Rack1
…
Rack50
Farm Racks with 1 x
32-port Switch or 2 x
16-port Switch
LHCb Detector
Niko Neufeld
CERN, PH
16
R&D for a future LHCb DAQ
• We are constantly playing
evaluating
with 10 Gbps
•
technologies in our spare time
Our main current interest is how far the
current DAQ architecture can be scaled up
– fringe benefit: make the current system more
efficient
• Once the two fundamental parameters for the
new LHCb DAQ are known (event size and
rate) a vigorous R&D program of ~ 2 years
should easily suffice to design and prototype a
system
Niko Neufeld
CERN, PH
17
Download