Update on DAQ Upgrade R&D with RCE/CIM and ATCA platform Rainer Bartoldus, Martin Kocian, Andy Haas, Mike Huffer, Su Dong, Emanuel Strauss, Matthias Wittgen Prelude • • Generic DAQ R&D at SLAC with the RCE (Reconfigurable Cluster Element) and CIM (Cluster Interconnect Module) on ATCA platform being adapted to ATLAS DAQ upgrade R&D. Many previous communications e.g.: – Mike Huffer at ACES Mar/09 (& sessions of last ATUW): • http://indico.cern.ch/materialDisplay.py?contribId=51&sessionId=25&materialId=slides&confId=47853 – Rainer Bartoldus at ROD workshop Jun/09: • • http://indico.cern.ch/materialDisplay.py?contribId=16&sessionId=4&materialId=slides&confId=59209 RCE training workshop at CERN June/09: – http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=57836 with introductions, instructions and discussions as current source of documentations. • A collaborative R&D open to all: – Shared RCE test stand at CERN (E-mail Rainer to get an account): https://twiki.cern.ch/twiki/bin/view/Atlas/RCEDevelopmentLab – E-Group for communications: atlas-highlumi-RCE-development for open signup. – Everyone is welcome to explore ! 2 Essential Features of RCE on ATCA • Generic DAQ concept with RCE born out of analysis of previous HEP DAQ systems to establish basic building blocks serving common needs of broad range of applications. • Explore the modern System-On-Chip technology with e.g. Vertex-4 FPGAs with versatile integrated resources. • High speed I/O capabilities for multi Gb/s transmissions to fully utilize FPGA processing power and reduce system footprint. • Implementation over ATCA based crate infrastructure to benefit from modern telecommunication technology. • A system consists of RCE processing boards and CIM interconnect modules to utilize ATCA point-point serial backplane connections for high bandwidth data movements and 10GE ethernet access. • Rear Transition Modules (RTM) to facilitate custom user I/O. • Extensive software infrastructure and utilities are integral part of the design. 3 Reconfigurable Cluster Element (RCE) Current implementation On Virtex-4 FPGA reset & bootstrap options Combinatoric Logic MGTs DSP tiles Next generation with Virtex 5 & 6 RCE memory 2 Gbytes DX Ports Core Boot Options DSP tiles 450 MHZ PPC-405 Processor DSP tiles Memory Subsystem Cross-Bar 512 MByte RLDRAM-II Combinatoric Logic Combinatoric Logic Configuration 128 MByte Flash DX Ports Resources + Extensive associated software infrastructure and utilities Combinatoric Logic MGTs DSP tiles (192 MAC units) 4 RCE Hardware Resources • Multi-Gigabit Transceivers (MGTs) – up to 12 channels of: • • • • • SER/DES input/output buffering clock recovery 8b/10b encoder/decoder 64b/66b encoder/decoder – each channel can operate up to 6.5 gb/s – channels may be bound together for greater aggregate speed • Combinatoric logic • gates • flip-flops (block RAM) • I/O pins • DSP support – contains up 192 Multiple-Accumulate-Add (MAC) units 5 RCE Software & Development • Cross-development… – GNU cross-development environment (C & C++) – remote (network) GDB debugger – network console • Operating system support… – Bootstrap loader – Open Source Real-Time kernel (RTEMS) • POSIX compliant interfaces • Standard IP network stack – Exception handling support • Object-Oriented emphasis: – Class libraries (C++) • Plugin support • Configuration Interface 6 RCE board + RTM (Petacache project example) transceiver s RTM Zone 1 (power) Zone 2 Zone 3 RCE Media Slice controller Media Carrier with flash 7 Cluster Interconnect board + RTM RTM XFP Zone 1 10 GE switch 1G Ethernet Zone 3 CI RCE XFP 8 RCE Development Lab at CERN 9 Application of RCE to Pixel Calibration /CIM 10-GE Ethernet Existing Pixel Module 3 Gb/s HSIO Pixel Digital Calibration Demo by Martin Kocian After a few mask stages End of calibration Demonstrated at RCE training workshop Jun/15-16/2009 at CERN Similar setup used to test IBL ½ stave electrical data transmission with 16 channels. 10 RCE Development Status • RCE R&D has already moved to production systems for Linac Coherent Light Source (LCLS) controls and experiments at SLAC. Same RCE board used for many R&D projects: Petacache, LSST and ATLAS upgrade. • A significantly upgraded generation-2 RCE with Xilinx Vertex 5 is envisioned for the coming year, among the improvements include larger memory and more user firmware space. • Exploring RCE for ATLAS pixel upgrade/IBL: work are underway to port current pixel calibrations to RCEs, aiming at FE-I4 tests and IBL stave-0. • A companion I/O board (HSIO) is widely used for ATLAS Si strip detector upgrade test stand. A compact RCE+HSIO test stand board is planned for near future. 11 RCE+HSIO Test Stand Board • RCE boards have strong software base for flexible and fast development, but rather bulky with the ATCA crate infrastructure and excess reources not needed for test stand. • HSIO has the large variety and multiplicity of I/O channels to serve wide range of applications, but the vast FPGA resources is not easy to explore with coding only in firmware. • Dave Nelson is working on a combined test stand board merging RCE and HSIO: – A slimmed down single FPGA RCE and software support – A separate Virtex-5 FPGA play original HSIO role – Same variety of I/O channels as HSIO – Same simple stand alone bench operation as HSIO with just an external 48V, but can also just plug in an ATCA crate 12 Applications for ATLAS DAQ upgrade • Original investigation was a possible common ROD for most subsystems, and a new combined ROD+ROS architecture to drastically improve bandwidth throughput for phase-2 upgrade. • The mature R&D advance already allow serious considerations of the RCE/CIM concept for Phase-1 upgrade needs: – RODs for IBL – RODs for forward muon upgrades – RODs for AFP (detector very similar to IBL) – Potential benefit of high throughput ROS ? Must be able to live within the current TTC/TDAQ architecture 13 A possible 48-channel ROM (Readout Module) 14 sLHC Upgrade Read-Out-Crate (ROC) To L2 & Event Building (X12) 10 gb/s Rear Transition Module Rear Transition Module P3 CIM 10-GE switch L1 fanout switch management P3 10-GE switch 10-GE switch L1 fanout switch management 10-GE switch CIM Backplane (x4) 10 gb/s ROMs Shelf Management from L1 To monitoring & control (x4) 10 gb/s from L1 15 The upgrade path for IBL ROD • Changes to ROD/BOC and DAQ needed in any case: – Data links at 160Mhz needs at least new BOC (Back of Crate) and associated ROD firmware change. – IBL uses FE-I4 and 16 FEs per half stave so that some code changes are necessary anyway. – Upgrade detector need faster & more frequent calibration. – Difficulty with obsolete parts for maintaining current design. • Is there a forward looking upgrade path with modern technology for higher performance yet fit into phase-1 timescale ? – Generic RCE R&D with ATCA is adoptable on the IBL time scale for its DAQ and test needs at earlier stages. 16 IBL ROD VME Baseline Reproducing existing RODs to live with present bandwidth limitations by deploying large number of boards. 17 IBL ROD Upgrade Scheme Read Out Module Initial mode: pure ROD behavior to output via S-link to ROS Upgrade Mode: combined ROD+ROS behavior directly output to Ethernet. 18 IBL Upgrade Hardware Components (I) • ROM – Regular ROM assumes all functionalities of present ROD and with room to host ROS functionalities. – Each ROM has 6 FPGAs hosting 12 RCEs • process 40x160Mb/s input Fes with 10 RCEs (each RCE’s share of 640Mb/s is `trivial’ compared to the expected capacity). • Event building for S-link/ethernet output with 2 RCEs. – RCE includes all resources for data formatting, DAQ data flow, calibration + memory in present ROD. • RTM(ROM) – Similar front-end communication roles of the present BOC, while S-links are simpler Snap12 transceivers. – 40 channel compact optical I/O with TX/RX, same as current BOC.TX/RX control with FPGA via I2C from ROM. – No need to deal with 8b/10b encoding as the RCE has embedded native utilities to encode/decode. 19 IBL Upgrade Hardware Components (II) • CIM – Assumes the network interconnect management and external interface roles to cover present SBC and TIM functionalities. – RCE master + 2 Fulcrum FM224s ASICs for 10 GE network switching. • RTMc(CIM) – Ethernet I/O connections. – Some functionalities of present TIM and drivers for I/O with the pixel system TTC crate. 20 TTC Distribution in RCE/CIM crate Distributed interface with TTCrx ASIC paired with each RCE 21 Upgrade ROM Benefits for IBL Case • Allow more frequent/extensive/faster calibration – Calibration histogram data output path via 10GE ethernet will completely remove data shipping timing concerns. – 4x (12x) more memory per pixel than baseline IBL ROD (current outer layer ROD), and the memories are internal within RCE with much faster access. – Power PC programming environment much easier than DSPs for complex algorithms, while the 192 DSP tiles/RCE offers large processing power for repetitive simple processing. • Smaller footprint modern hardware for easier production, installation and maintenance. • Simpler variation of the ROM with present RCEs offers prototype and test stand boards to meet FE-I4 tests, stave test needs and same software preserved into full system. • Has built-in architecture evolution flexibility to explore upgrade schemes such as integrated ROD+ROS and potential services to trigger with the very high bandwidth. 22 Backward Compatibility & Commissioning • Despite the different look of hardware, the user interface will be no different to the existing pixel detector and interface to the rest of pixel DAQ and TDAQ will also look like just another pixel crate (until we try to become ROD+ROS). • Most existing DAQ/calibration DSP code are adoptable with much less development effort needed compared to original calibration implementation. • New system can also be made to be able to run on present blayer so that fiber splitting can be done early on with real system as parasitic DAQ commissioning (as extensively used in BaBar/Tevatron). • Switching between S-link and ROD+ROS mode can potentially be done without touching hardware. 23 Summary (I) • RCE/ATCA R&D already well advanced with prototypes being used for IBL/Pixel upgrade testing. • Investigation for the full readout crate for IBL indicate that the RCE ROM can easily meet the IBL ROD requirements and offers extra margin for much improved performance. • The project is very much realizable on the IBL time frame owing to the well advanced R&D already carried out at SLAC for other projects. • The upgrade system has a small hardware foot print and less hardware cost than VME systems. • The application software effort will benefit from integrated core software utilities and easy to make progress. • There is a full suite of test prototypes promising same software to be used for tests and finale DAQ/calibration. 24 Summary (II) • The application for other subsystems (e.g. forward muon) may be simpler if the inputs as also Glinks like S-link. A more flexible configuration possible for the symmetric Glink I/O. We are interested in pure DAQ use cases where this cannot be easily adopted. • There is sufficient flexibility to allow reconfiguring the architecture to very different modes, including the classical mode fully compatible with current architecture. • Exploring other possibilities e.g. L1.5 triggers with similar architecture ? We believe there is a viable path for ATLAS to evolve smoothly into a modern DAQ architecture even before phase-2 25 Backup 26 Why ATCA as a packaging standard? • An emerging telecom standard… • Its attractive features: – backplane & packaging available as a commercial solution – generous form factor – – – – • 8U x 1.2” pitch hot swap capability well-defined environmental monitoring & control emphasis on High Availability external power input is low voltage DC • allows for rack aggregation of power • Its very attractive features: – the concept of a Rear Transition Module (RTM) • allows all cabling to be on rear (module removal without interruption of cable plant) • allows separation of data interface from the mechanism used to process that data – high speed serial backplane • protocol agnostic • provision for different interconnect topologies 27 Three building block concepts • Computational elements – must be low-cost The Reconfigurable Cluster Element (RCE) – employs System-On-Chip technology (SOC) – must support a variety of computational models – must have both flexible and performanent I/O Mechanism to connect together these elements – must be low-cost – must provide low-latency/high-bandwidth I/O – must be based on a commodity (industry) protocol – must support a variety of interconnect • The Cluster Interconnect (CI) topologies – based on 10-GE Ethernet • hierarchical switching • peer-to-peer • $$$ • footprint • power • • • fan-In & fan-Out • • Packaging solution for both element & interconnect – must provide High Availability – must allow scaling – must support different physical I/O interfaces – preferably based on a commercial standard ATCA – Advanced Telecommunication Computing Architecture – crate based, serial backplane 28 The Cluster Interconnect (CI) Q0 10-GE L2 switch Q1 Management bus 10-GE L2 switch RCE Q2 • Q3 Based on two Fulcrum FM224s – 24 port 10-GE switch – is an ASIC (packaging in 1433-ball BGA) – XAUI interface (supports multiple speeds including 100-BaseT, 1-GE & 2.5 gb/s) – less then 24 watts at full capacity – cut-through architecture (packet ingress/egress < 200 NS) – full Layer-2 functionality (VLAN, multiple spanning tree etc..) – configuration can be managed or unmanaged 29 Derived configuration - Cluster Element (CE) 3.125 gb/s MGTs PGP PGP PGP PGP PGP PGP PGP PGP Combinatoric logic Core Ethernet MAC Ethernet MAC Combinatoric logic MGTs E1 E0 1.0/2/5/10.0 gb/s 30 Cluster Interconnect board + RTM (Block diagram) P3 XFP XFP XFP 1-GE (fabric) 10-GE XFP Q0 (fabric) 10-GE (base) XFP Q2 XFP fabric CI MFD 10-GE P2 base 1-GE Q1 (base) 10-GE Q3 XFP XFP XFP XFP P3 Payload RTM 31 Typical (5 slot) ATCA crate fans Front Shelf manager CI board RCE board Back Power supplies RCE RTM CI RTM 32 IBL Readout Production Cost Estimate Items Quantity Unit cost (K$) Sum Cost (K$) ROM 12 + 6 spares 8 144 RTM 12 + 6 spares 3? 54 CIM 2 + 3 spares 5 25 RTMc 2 + 3 spares 3 15 Crates 1 + 3 spares 5 20 Total • • 255 Prototyping is expected to add ~100K$(?) No longer needs SBC and TIM 33 TTC ROD busy 34