Vertex Triggering and the Associated DAQ for b Meson Decays Erik Gottschalk Fermilab 1st LHCb Collaboration Upgrade Workshop National E-Science Centre, Edinburgh - January 11-12, 2007 Overview • Introduction and overview of the BTeV trigger and DAQ • Level-1 trigger algorithm • Level-1 trigger hardware – Original baseline design (digital signal processors – DSPs) – Revised baseline design (commodity hardware) – Proposed change to the revised baseline design (upstream event builder & blade servers) • History of BTeV trigger R&D • Comments on magnetic field • Summary Not covered: • Level-1 muon trigger • Level-2/3 trigger (high-level trigger – “HLT”) 1st LHCb Collaboration Upgrade Workshop 2 Erik Gottschalk Introduction • The challenge for the BTeV trigger and data acquisition system was to reconstruct particle tracks and interaction vertices for EVERY proton-antiproton interaction in the BTeV detector, and to select interactions with b-meson decays. • The trigger system was designed with 3 levels, referred to as Levels 1, 2, and 3: “L1” – look at every interaction and reject at least 98% of minimum bias background “L2” – use L1 computed results & perform more refined analyses for data selection “L3” – reject additional background and perform data-quality monitoring Reject > 99.9% of background. Keep > 50% of B events. • The data acquisition system (DAQ) was designed to save all of the data in memory for as long as was necessary to analyze each interaction, and move data to L2/3 processors (“HLT farm”) and archival data storage. 1st LHCb Collaboration Upgrade Workshop 3 Erik Gottschalk BTeV Trigger and Data Acquisition System 2.5 MHz 500 GB/s (200KB/event) L1 rate reduction: ~50x 50 KHz 12.5 GB/s (250KB/event) L2/3 rate reduction: ~20x 2.5 KHz 200 MB/s (250KB / 3.125 = 80KB/event) 1st LHCb Collaboration Upgrade Workshop 4 Erik Gottschalk BTeV Detector p RICH Dipole Magnet p Muon Straws & Si Strips EM Cal 30 Station Pixel Detector 1st LHCb Collaboration Upgrade Workshop 5 Erik Gottschalk Silicon Pixel Detector 50 µm Multichip module 1 cm 400 µm 5 cm pixel sensors 10 cm Sensor module Readout module Wire bonds 6 cm Pixel detector half-station TPG substrate 1st LHCb Collaboration Upgrade Workshop 6 HDI flex circuit Bump bonds Erik Gottschalk Pixel Data Readout & 1st Part of L1 Trigger Collision Hall Counting Room Pixel stations to neighboring FPGA segment tracker Pixel data combiner boards Pixel processor Optical links Pixel processor Pixel processor FPGA segment tracker (ST) 12 channels @2.5Gbps Pixel processor to neighboring FPGA segment tracker time stamp ordering pixel clustering xy coordinates FPIX2 Read-out chip 1st LHCb Collaboration Upgrade Workshop 7 Erik Gottschalk Simulated B Event 1st LHCb Collaboration Upgrade Workshop 8 Erik Gottschalk L1 Vertex Trigger Algorithm Two stage trigger algorithm: 1. Segment tracking (FPGAs) 2. Track/vertex finding (PCs) 1) Segment tracking stage: Use pixel hits from 3 neighboring stations to find the beginning and ending segments of tracks. These segments are referred to as triplets. 1st LHCb Collaboration Upgrade Workshop 9 Erik Gottschalk Segment Tracker: Inner Triplets 1a) Segment tracking stage: phase 1 Start with inner triplets close to the interaction region. An inner triplet represents the start of a track. 1st LHCb Collaboration Upgrade Workshop 10 Erik Gottschalk Segment Tracker: Outer Triplets 1b) Segment finding stage: phase 2 Next, find the outer triplets close to the boundaries of the pixel detector volume. An outer triplet represents the end of a track. 1st LHCb Collaboration Upgrade Workshop 11 Erik Gottschalk Track/Vertex Finding 2a) Track finding phase: Match inner triplets with outer triplets to find complete tracks. 2b) Vertex finding phase: • Use reconstructed tracks to locate interaction vertices • Search for tracks detached from interaction vertices 1st LHCb Collaboration Upgrade Workshop 12 Erik Gottschalk L1 Trigger Decision Execute Trigger • Generate Level-1 accept if ! 2 “detached” tracks going into the instrumented arm of the BTeV detector with: pT2 " 0.25 (GeV/c)2 b " 6# b ! 0.2 cm B-meson p p b 1st LHCb Collaboration Upgrade Workshop 13 Erik Gottschalk BTeV Trigger Architecture Pixel Processors FPGA Segment Finder Data Combiners + Optical Transmitters Level-1 Buffers Gigabit Ethernet Switch Track/Vertex Farm Front End Boards Optical Receivers Global Level-1 GL 1 Level 2/3 Processor Farm Information Transfer Control Hardware IT CH 12 x 24-port Fast Ethernet Switches BTeV Detector 8 Data Highways Cross Connect Switch Data Logger 1st LHCb Collaboration Upgrade Workshop 14 Erik Gottschalk Original Baseline: DSP-Based L1 Track/Vertex Hardware RAM DSP DSP Data from segment finder 64 KB FIFO Buffer Manager (BM) RAM DSP Processed results to L1 buffers RAM RAM To GL1 DSP Hitachi H8S Trigger Results Manager (TM) ArcNet Controller RAM Hitachi H8S RAM DSP To external host computer ArcNet Controller Compact Flash On-board Peripherals Glue Logic (OPGL) Host Port Glue Logic (HPGL) LCD Display M cBSP lines (SPI mode) HPI bus JTAG Block diagram of prototype L1 track/vertex farm hardware 1st LHCb Collaboration Upgrade Workshop 15 Erik Gottschalk Revised L1 Baseline using Commodity Hardware 30 Pixel Stations 33 “8GHz” Apple Xserve G5’s with dual IBM970’s (two 4GHz dual core 970’s) Other detectors Pixel Processors FPGA Segment Finders (56 per highway) Front ends L1 muon Trk/Vtx node #2 PTSM network Trk/Vtx node #N 56 inputs at ~45 MB/s each 33 outputs at ~76 MB/s each Trk/Vtx node #1 L1 buffers Level 1 switch GL1 ITCH L1 Farm L2/3 Switch Infiniband switch 1 Highway 1st LHCb Collaboration Upgrade Workshop Xserve identical to track/vertex nodes 16 L2/3 Farm Erik Gottschalk Number of Processors Needed for L1 Farm • Assume an “8.0 GHz” IBM 970 (to be purchased in 2008): – L1 trk/vtx code (straight C code without any hardware enhancements like hash-sorter or FPGA segment-matcher) takes 379 µs/crossing on a 2.0 GHz IBM 970 (Apple G5) for minimum bias events with <6> interactions/crossing – assume following: • L1 code: 50%, RTES: 10%, spare capacity: 40% • 379 µs + 76 µs + 303 µs = 758 µs – 758 µs ÷ 4 = 190 µs on a 8.0 GHz IBM 970 – Include additional 10% processing for L1-buffer operations in each node • • 190 µs + 19 µs = 209 µs Number of “8.0 GHz” IBM 970’s needed for L1 track/vertex farm: – 209 µs/396 ns = 528 cpu’s for all 8 highways, 66 cpu’s per highway – 33 dual cpu Apple Xserve G5’s per highway Note: 64-bit IBM PowerPC 970MP, 4-core 2.5 GHz processors available today. 1st LHCb Collaboration Upgrade Workshop 17 Erik Gottschalk Proposed Hardware for 1 Highway with Blade Servers Complete L1 segment tracking and L1 track/vertex hardware for 1 highway housed in 4 blade server crates 1st LHCb Collaboration Upgrade Workshop 18 Erik Gottschalk BTeV Trigger R&D The design of the BTeV trigger and data acquisition system was the result of a decade of R&D. Three stages of R&D occurred during this time: • Simulations to establish trigger requirements and develop a baseline design. • Prototyping of algorithms and hardware components to establish performance metrics and determine cost estimates. • Optimizing the design (changes to baseline design) to reduce cost of construction and maintenance. Note: close collaboration between the trigger group and the pixel group was essential, especially during the early R&D stages. 1st LHCb Collaboration Upgrade Workshop 19 Erik Gottschalk History of Trigger R&D (beginning in 1998) • 1998 – Algorithms, Simulations – Baseline Design: UPenn trigger for 3-plane pixel detector (Selove) – Alternative trigger algorithms (Gottschalk, Husby, Procario) • 1999 – Algorithms, Simulations – – – – – – Digital signal processor (DSP) timing & optimization studies (Gao) Systolic associative-array track finder (Husby) UPenn trigger for mixed 2-plane & 3-plane pixel detector (Selove) Carnegie Mellon trigger for 2-plane pixel detector (Procario) Megapixel trigger for 2-plane pixel detector (Gottschalk) New baseline: BB33 trigger for 2-plane pixel detector (Gottschalk) • 2000 – Trigger timing studies – DSP timing studies for BB33 (Gao) – Reduction in size (“70% solution”) of non-bend view pixel planes (Pixel Group) 1st LHCb Collaboration Upgrade Workshop 20 Erik Gottschalk History of Trigger R&D (cont.) • 2001 – Trigger timing studies, RTES Project – – – – DSP timing on Texas Instruments TMS320C6711 (Wang) BB33 algorithm modified for staggered pixel half-planes Introduction of 8-fold trigger/data-acquisition system “highways” Real Time Embedded Systems (RTES) Project for fault tolerance and fault-adaptive computing ($5M NSF Grant) • 2002 – FPGA and DSP hardware – – – – – BTeV descoped from a double-arm to a single-arm spectrometer DSP optimization on Texas Instruments TMS320C6711 (Wang) L1 timing for commercial-off-the-shelf (COTS) processors (Wang) L1 FPGA functions implemented and simulated (Zmuda) L1 prototype system with 4 DSPs (Trigger Group) 1st LHCb Collaboration Upgrade Workshop 21 Erik Gottschalk History of Trigger R&D (cont.) • 2003 – FPGA and DSP hardware, L1 & L2 studies – – – – – Implementation of BB33 segment tracker in an FPGA (Zmuda) Tests of the L1 4-DSP prototype (Trigger Group) FPGA track triplet hash sorter to reduce L1 processing time (Wu) L1 trigger studies for 396 ns between bunch crossings (Penny K.) L2 trigger timing improved by a factor of x120 (Penny Kasper) • 2004 – Baseline changes – – – – Baseline change to replace L1 DSPs with COTS processors Baseline change to replace custom switch with Infiniband switch Evaulation of Blade Servers for L1 trigger (Wang) “Tiny Triplet Finder” proposed to simplify FPGA algorithm (Wu) • 2005 – Baseline change – Prepare baseline change request to modify L1 trigger architecture 1st LHCb Collaboration Upgrade Workshop 22 Erik Gottschalk Magnetic Field • BTeV spectrometer magnet – Dipole magnet with central field of 1.6 Tesla – Integrated dipole field of 5.2 T-m – Magnet centered on the interaction region • How did the L1 trigger depend on the magnetic field? – Optimized cuts for p > 5 GeV/c tracks in segment tracker – Change in track slope used to project inner to outer triplet – Used change in track slope to resolve inner/outer ambiguities – Rejected matched triplets with incompatible track slopes – Required pT2 > 0.25 (GeV/c)2 to count detached tracks 1st LHCb Collaboration Upgrade Workshop 23 Erik Gottschalk Track Reconstruction Efficiency Efficiency BB33 Algorithm BB33 track reconstruction efficiency vs momentum for primary and B tracks Track doublet finder (precision y hits) Interior 99.2% Exterior 99.5% Track triplet finder (precision y hits) Interior 99.1% Exterior 99.0% Hit matcher (precision x hits) Interior 99.1% Int./Ext. matcher 97.0% Fake remover #1 94.9% Fake remover #2 94.5% Fake remover #3 94.5% Fake remover #4 93.4% Fake remover #5 91.7% BB33 results from Dec. 1999 Particle Momentum (GeV/c) 1st LHCb Collaboration Upgrade Workshop Track Reconstruction Efficiency (p > 5 GeV) 24 Erik Gottschalk Fake Tracks at each stage of BB33 Algorithm BB33 Algorithm Percentage of Tracks that are fake Track doublet finder (precision y hits) Interior 82.7% Exterior 83.1% Track triplet finder (precision y hits) Interior 16.6% Exterior 11.7% Hit matcher (precision x hits) Interior 14.5% Int./Ext. matcher 18.3% Fake remover #1 5.8% Fake remover #2 1.3% Fake remover #3 1.3% • 30% are 3-station tracks Fake remover #4 0.84% (ignored by the trigger) Fake remover #5 0.67% Of the fake tracks that remain: • 20% have fake interiors • 50% have fake exteriors BB33 results from Dec. 1999 1st LHCb Collaboration Upgrade Workshop 25 Erik Gottschalk Summary • Brief introduction and overview of BTeV trigger and DAQ • Brief overview of L1 trigger algorithm • L1 trigger hardware – DSPs in the original baseline design – Commodity network and PC hardware for revised baseline – Proposed blade-server architecture • History of R&D • Influence of magnetic field on L1 trigger The trigger evolved from an original design that satisfied BTeV requirements to a new design that was easier to build, required less labor, had lower cost, and had lower risk. The proposed blade-server architecture was expected to reduce the cost of the trigger even further while improving on the design. 1st LHCb Collaboration Upgrade Workshop 26 Erik Gottschalk End End 1st LHCb Collaboration Upgrade Workshop 27 Erik Gottschalk Backup slides Backup slides 1st LHCb Collaboration Upgrade Workshop 28 Erik Gottschalk Five Levels of Fake-Track Removal in BB33 Fake remover #1 • remove tracks with interior triplets that share y hits • triplets must start in the same station and have the same direction Fake remover #2 • choose the best exterior triplet (based on change in track slope) when more than one exterior triplet (in one station) is matched to the same interior triplet Fake remover #3 • when two exterior triplets in adjacent stations are matched to an interior triplet, then select the most “downstream” exterior triplet • when two exterior triplets in non-adjacent stations are matched to an interior triplet, then declare both tracks as fake Fake remover #4 • when two interior triplets are matched to the same exterior triplet, then select the most “upstream” interior triplet Fake remover #5 • remove tracks that have interior and exterior triplets with opposite sign for the change in track slope 1st LHCb Collaboration Upgrade Workshop 29 Erik Gottschalk Level 1 Vertex Trigger 30 station pixel detector FPGA segment trackers Event Building Switch: sort by crossing number track/vertex farm (~528 processors) Merge Trigger decision to Global Level 1 1st LHCb Collaboration Upgrade Workshop 30 Erik Gottschalk L1 Segment Tracker Prototype FPGA hardware implementation of 3-station segment tracker 1st LHCb Collaboration Upgrade Workshop 31 Erik Gottschalk Data Rate into L1 Farm Using the triplet format proposed in BTeV-doc-1555-v2: internal triplets: 128 bits = 16 Bytes external triplets: 74 bits = 10 Bytes Then assumed 35 internal and 35 external triplets for events with <2> interactions/crossing: 35 x 26 Bytes = 910 Bytes Extrapolating linearly to 9 interactions/crossing and applying a safety factor of 2: 2 x (4.5 x 910 Bytes = 4095 Bytes) = 8190 Bytes Total rate going into L1 track/vertex farm in all 8 highways : 2.5 MHz x 8190 Bytes ~ 20 GBytes/s 1st LHCb Collaboration Upgrade Workshop 32 Erik Gottschalk Blade Server Platform In the proposed architecture we replace Apple Xserves with blade servers. Intel/IBM blade server chassis Each 7U crate can hold up to 14 dual-cpu blades which are available with Intel Xeon’s (>3.0GHz) or IBM970PPC’s. 6 of these crates will fit into a standard 42U rack for a total of 168 CPU’s (on dual cpu blades) per rack. 1st LHCb Collaboration Upgrade Workshop 33 Erik Gottschalk Blade Server Platform Features front rear view view 2 network interfaces are provided on the mid-plane of the chassis: - primary/base network: each slot connects blade’s on-board gigabitethernet ports to a switch module in the rear - secondary hi-speed network: each slot also connects ports on the blade’s optional I/O expansion card to an optional hi-speed switch module (myrinet, infiniband, etc.) - the I/O expansion card might even be a custom card with an FPGA to implement a custom protocol. 1st LHCb Collaboration Upgrade Workshop 34 Erik Gottschalk L1 Trigger Hardware in a Blade Server Crate From pixel pre-processor Segment tracker boards 156 MB/s out of each ST board (use 2 pairs) 4 MB/s to GL1 Module 3 Module 4 0.5 MB/s out of each L1tv node to GL1 (use 1 pair) 78 MB/s into L1tv node (use 1 pair) Dual CPU Track & Vertex blades 1st LHCb Collaboration Upgrade Workshop 35 Erik Gottschalk Level 1 Hardware Rack Count Complete L1 trigger hardware for one highway can fit in one 42U rack Segment finder & track/vertex hardware in 4 blade server crates Upstream event builder/pixel pre-processor Entire L1 trigger for all 8 highways in 8 42U racks 1st LHCb Collaboration Upgrade Workshop 36 Erik Gottschalk