Update on DAQ upgrade R&D with RCE/CIM and ATCA

advertisement
Update on DAQ Upgrade R&D with
RCE/CIM and ATCA platform
Rainer Bartoldus, Martin Kocian, Andy Haas, Mike Huffer,
Su Dong, Emanuel Strauss, Matthias Wittgen
Prelude
•
•
Generic DAQ R&D at SLAC with the RCE (Reconfigurable Cluster
Element) and CIM (Cluster Interconnect Module) on ATCA platform
being adapted to ATLAS DAQ upgrade R&D.
Many previous communications e.g.:
– Mike Huffer at ACES Mar/09 (& sessions of last ATUW):
•
http://indico.cern.ch/materialDisplay.py?contribId=51&sessionId=25&materialId=slides&confId=47853
– Rainer Bartoldus at ROD workshop Jun/09:
•
•
http://indico.cern.ch/materialDisplay.py?contribId=16&sessionId=4&materialId=slides&confId=59209
RCE training workshop at CERN June/09:
–
http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=57836
with introductions, instructions and discussions as current source of
documentations.
• A collaborative R&D open to all:
– Shared RCE test stand at CERN (E-mail Rainer to get an account):
https://twiki.cern.ch/twiki/bin/view/Atlas/RCEDevelopmentLab
– E-Group for communications: atlas-highlumi-RCE-development for open
signup.
– Everyone is welcome to explore !
2
Essential Features of RCE on ATCA
• Generic DAQ concept with RCE born out of analysis of previous
HEP DAQ systems to establish basic building blocks serving
common needs of broad range of applications.
• Explore the modern System-On-Chip technology with e.g.
Vertex-4 FPGAs with versatile integrated resources.
• High speed I/O capabilities for multi Gb/s transmissions to fully
utilize FPGA processing power and reduce system footprint.
• Implementation over ATCA based crate infrastructure to
benefit from modern telecommunication technology.
• A system consists of RCE processing boards and CIM
interconnect modules to utilize ATCA point-point serial
backplane connections for high bandwidth data movements and
10GE ethernet access.
• Rear Transition Modules (RTM) to facilitate custom user I/O.
• Extensive software infrastructure and utilities are integral part
of the design.
3
Reconfigurable Cluster Element (RCE)
Current implementation
On Virtex-4 FPGA
reset & bootstrap
options
Combinatoric
Logic
MGTs
DSP tiles
Next generation with
Virtex 5 & 6
RCE memory
2 Gbytes
DX Ports
Core
Boot Options
DSP tiles
450 MHZ PPC-405
Processor
DSP tiles
Memory Subsystem
Cross-Bar
512 MByte RLDRAM-II
Combinatoric
Logic
Combinatoric
Logic
Configuration
128 MByte Flash
DX Ports
Resources
+ Extensive associated
software infrastructure
and utilities
Combinatoric
Logic
MGTs
DSP tiles (192 MAC units)
4
RCE Hardware Resources
• Multi-Gigabit Transceivers (MGTs)
– up to 12 channels of:
•
•
•
•
•
SER/DES
input/output buffering
clock recovery
8b/10b encoder/decoder
64b/66b encoder/decoder
– each channel can operate up to 6.5 gb/s
– channels may be bound together for greater aggregate speed
• Combinatoric logic
• gates
• flip-flops (block RAM)
• I/O pins
• DSP support
– contains up 192 Multiple-Accumulate-Add (MAC) units
5
RCE Software & Development
• Cross-development…
– GNU cross-development environment (C & C++)
– remote (network) GDB debugger
– network console
• Operating system support…
– Bootstrap loader
– Open Source Real-Time kernel (RTEMS)
• POSIX compliant interfaces
• Standard IP network stack
– Exception handling support
• Object-Oriented emphasis:
– Class libraries (C++)
• Plugin support
• Configuration Interface
6
RCE board + RTM (Petacache project example)
transceiver
s
RTM
Zone 1
(power)
Zone 2
Zone 3
RCE
Media Slice
controller
Media Carrier
with flash
7
Cluster Interconnect board + RTM
RTM
XFP
Zone 1
10 GE
switch
1G
Ethernet
Zone 3
CI
RCE
XFP
8
RCE Development Lab at CERN
9
Application of RCE to Pixel Calibration
/CIM
10-GE
Ethernet
Existing Pixel
Module
3 Gb/s
HSIO
Pixel Digital
Calibration
Demo by
Martin Kocian
After a few
mask stages
End of calibration
Demonstrated at RCE training workshop Jun/15-16/2009 at CERN
Similar setup used to test IBL ½ stave electrical data transmission with 16 channels.
10
RCE Development Status
• RCE R&D has already moved to production systems for Linac
Coherent Light Source (LCLS) controls and experiments at
SLAC. Same RCE board used for many R&D projects: Petacache, LSST and ATLAS upgrade.
• A significantly upgraded generation-2 RCE with Xilinx Vertex 5
is envisioned for the coming year, among the improvements
include larger memory and more user firmware space.
• Exploring RCE for ATLAS pixel upgrade/IBL: work are underway
to port current pixel calibrations to RCEs, aiming at FE-I4
tests and IBL stave-0.
• A companion I/O board (HSIO) is widely used for ATLAS Si
strip detector upgrade test stand. A compact RCE+HSIO test
stand board is planned for near future.
11
RCE+HSIO Test Stand Board
• RCE boards have strong software base for flexible and fast
development, but rather bulky with the ATCA crate
infrastructure and excess reources not needed for test stand.
• HSIO has the large variety and multiplicity of I/O channels to
serve wide range of applications, but the vast FPGA resources
is not easy to explore with coding only in firmware.
• Dave Nelson is working on a combined test stand board merging
RCE and HSIO:
– A slimmed down single FPGA RCE and software support
– A separate Virtex-5 FPGA play original HSIO role
– Same variety of I/O channels as HSIO
– Same simple stand alone bench operation as HSIO with just
an external 48V, but can also just plug in an ATCA crate
12
Applications for ATLAS DAQ upgrade
• Original investigation was a possible common ROD for most
subsystems, and a new combined ROD+ROS architecture to
drastically improve bandwidth throughput for phase-2 upgrade.
• The mature R&D advance already allow serious considerations of
the RCE/CIM concept for Phase-1 upgrade needs:
– RODs for IBL
– RODs for forward muon upgrades
– RODs for AFP (detector very similar to IBL)
– Potential benefit of high throughput ROS ?
Must be able to live within the current TTC/TDAQ
architecture
13
A possible 48-channel ROM (Readout Module)
14
sLHC Upgrade Read-Out-Crate (ROC)
To L2 & Event Building
(X12)
10 gb/s
Rear Transition Module
Rear Transition Module
P3
CIM
10-GE
switch
L1 fanout
switch management
P3
10-GE
switch
10-GE
switch
L1 fanout
switch management
10-GE
switch
CIM
Backplane
(x4)
10 gb/s
ROMs
Shelf Management
from L1
To monitoring & control
(x4)
10 gb/s
from L1
15
The upgrade path for IBL ROD
• Changes to ROD/BOC and DAQ needed in any case:
– Data links at 160Mhz needs at least new BOC (Back of
Crate) and associated ROD firmware change.
– IBL uses FE-I4 and 16 FEs per half stave so that some
code changes are necessary anyway.
– Upgrade detector need faster & more frequent calibration.
– Difficulty with obsolete parts for maintaining current design.
• Is there a forward looking upgrade path with modern
technology for higher performance yet fit into
phase-1 timescale ?
– Generic RCE R&D with ATCA is adoptable on the IBL time
scale for its DAQ and test needs at earlier stages.
16
IBL ROD VME Baseline
Reproducing existing RODs to live with present
bandwidth limitations by deploying large number
of boards.
17
IBL ROD Upgrade Scheme
Read Out
Module
Initial mode: pure ROD behavior to output via S-link to ROS
Upgrade Mode: combined ROD+ROS behavior directly output to Ethernet.
18
IBL Upgrade Hardware Components (I)
• ROM
– Regular ROM assumes all functionalities of present ROD and
with room to host ROS functionalities.
– Each ROM has 6 FPGAs hosting 12 RCEs
• process 40x160Mb/s input Fes with 10 RCEs (each RCE’s share
of 640Mb/s is `trivial’ compared to the expected capacity).
• Event building for S-link/ethernet output with 2 RCEs.
– RCE includes all resources for data formatting, DAQ data
flow, calibration + memory in present ROD.
• RTM(ROM)
– Similar front-end communication roles of the present BOC,
while S-links are simpler Snap12 transceivers.
– 40 channel compact optical I/O with TX/RX, same as
current BOC.TX/RX control with FPGA via I2C from ROM.
– No need to deal with 8b/10b encoding as the RCE has
embedded native utilities to encode/decode.
19
IBL Upgrade Hardware Components (II)
• CIM
– Assumes the network interconnect management and external
interface roles to cover present SBC and TIM
functionalities.
– RCE master + 2 Fulcrum FM224s ASICs for 10 GE network
switching.
• RTMc(CIM)
– Ethernet I/O connections.
– Some functionalities of present TIM and drivers for I/O
with the pixel system TTC crate.
20
TTC Distribution in RCE/CIM crate
Distributed interface with TTCrx ASIC paired with each RCE
21
Upgrade ROM Benefits for IBL Case
• Allow more frequent/extensive/faster calibration
– Calibration histogram data output path via 10GE ethernet will
completely remove data shipping timing concerns.
– 4x (12x) more memory per pixel than baseline IBL ROD (current
outer layer ROD), and the memories are internal within RCE with
much faster access.
– Power PC programming environment much easier than DSPs for
complex algorithms, while the 192 DSP tiles/RCE offers large
processing power for repetitive simple processing.
• Smaller footprint modern hardware for easier production,
installation and maintenance.
• Simpler variation of the ROM with present RCEs offers
prototype and test stand boards to meet FE-I4 tests, stave
test needs and same software preserved into full system.
• Has built-in architecture evolution flexibility to explore upgrade
schemes such as integrated ROD+ROS and potential services to
trigger with the very high bandwidth.
22
Backward Compatibility & Commissioning
• Despite the different look of hardware, the user interface will
be no different to the existing pixel detector and interface to
the rest of pixel DAQ and TDAQ will also look like just another
pixel crate (until we try to become ROD+ROS).
• Most existing DAQ/calibration DSP code are adoptable with
much less development effort needed compared to original
calibration implementation.
• New system can also be made to be able to run on present blayer so that fiber splitting can be done early on with real
system as parasitic DAQ commissioning (as extensively used in
BaBar/Tevatron).
• Switching between S-link and ROD+ROS mode can potentially be
done without touching hardware.
23
Summary (I)
• RCE/ATCA R&D already well advanced with prototypes being
used for IBL/Pixel upgrade testing.
• Investigation for the full readout crate for IBL indicate that
the RCE ROM can easily meet the IBL ROD requirements and
offers extra margin for much improved performance.
• The project is very much realizable on the IBL time frame
owing to the well advanced R&D already carried out at SLAC
for other projects.
• The upgrade system has a small hardware foot print and less
hardware cost than VME systems.
• The application software effort will benefit from integrated
core software utilities and easy to make progress.
• There is a full suite of test prototypes promising same software
to be used for tests and finale DAQ/calibration.
24
Summary (II)
• The application for other subsystems (e.g. forward muon) may
be simpler if the inputs as also Glinks like S-link. A more
flexible configuration possible for the symmetric Glink I/O.
We are interested in pure DAQ use cases where this cannot be
easily adopted.
• There is sufficient flexibility to allow reconfiguring the
architecture to very different modes, including the classical
mode fully compatible with current architecture.
• Exploring other possibilities e.g. L1.5 triggers with similar
architecture ?
We believe there is a viable path for ATLAS to evolve smoothly
into a modern DAQ architecture even before phase-2
25
Backup
26
Why ATCA as a packaging standard?
• An emerging telecom standard…
• Its attractive features:
– backplane & packaging available as a commercial solution
– generous form factor
–
–
–
–
• 8U x 1.2” pitch
hot swap capability
well-defined environmental monitoring & control
emphasis on High Availability
external power input is low voltage DC
• allows for rack aggregation of power
• Its very attractive features:
– the concept of a Rear Transition Module (RTM)
• allows all cabling to be on rear (module removal without interruption of
cable plant)
• allows separation of data interface from the mechanism used to process
that data
– high speed serial backplane
• protocol agnostic
• provision for different interconnect topologies
27
Three building block concepts
•
Computational elements
– must be low-cost
The Reconfigurable Cluster Element
(RCE)
– employs System-On-Chip
technology (SOC)
– must support a variety of computational
models
– must have both flexible and performanent I/O
Mechanism to connect together these elements
– must be low-cost
– must provide low-latency/high-bandwidth I/O
– must be based on a commodity (industry)
protocol
– must support a variety of interconnect
• The Cluster Interconnect (CI)
topologies
– based on 10-GE Ethernet
• hierarchical
switching
• peer-to-peer
• $$$
• footprint
• power
•
•
• fan-In & fan-Out
•
•
Packaging solution for both element & interconnect
– must provide High Availability
– must allow scaling
– must support different physical I/O
interfaces
– preferably based on a commercial standard
ATCA
– Advanced Telecommunication
Computing Architecture
– crate based, serial
backplane
28
The Cluster Interconnect (CI)
Q0
10-GE L2 switch
Q1
Management bus
10-GE L2 switch
RCE
Q2
•
Q3
Based on two Fulcrum FM224s
– 24 port 10-GE switch
– is an ASIC (packaging in 1433-ball BGA)
– XAUI interface (supports multiple speeds including 100-BaseT,
1-GE & 2.5 gb/s)
– less then 24 watts at full capacity
– cut-through architecture (packet ingress/egress < 200 NS)
– full Layer-2 functionality (VLAN, multiple spanning tree etc..)
– configuration can be managed or unmanaged
29
Derived configuration - Cluster Element (CE)
3.125 gb/s
MGTs
PGP
PGP
PGP
PGP
PGP
PGP
PGP
PGP
Combinatoric logic
Core
Ethernet MAC
Ethernet MAC
Combinatoric logic
MGTs
E1
E0
1.0/2/5/10.0 gb/s
30
Cluster Interconnect board + RTM (Block diagram)
P3
XFP
XFP
XFP
1-GE
(fabric)
10-GE
XFP
Q0
(fabric)
10-GE
(base)
XFP
Q2
XFP
fabric
CI
MFD
10-GE
P2
base
1-GE
Q1
(base)
10-GE
Q3
XFP
XFP
XFP
XFP
P3
Payload
RTM
31
Typical (5 slot) ATCA crate
fans
Front
Shelf
manager
CI
board
RCE board
Back
Power
supplies
RCE RTM
CI RTM
32
IBL Readout Production Cost Estimate
Items
Quantity
Unit cost (K$)
Sum Cost (K$)
ROM
12 + 6 spares
8
144
RTM
12 + 6 spares
3?
54
CIM
2 + 3 spares
5
25
RTMc
2 + 3 spares
3
15
Crates
1 + 3 spares
5
20
Total
•
•
255
Prototyping is expected to add ~100K$(?)
No longer needs SBC and TIM
33
TTC ROD busy
34
Download