BTeV Trigger BTeV was terminated in February of 2005. BTeV Trigger Overview • Trigger Philosophy: Trigger on characteristics common to all heavy-quark decays, separated production and decay vertices. • Aim: Reject > 99.9% of background. Keep > 50% of B events. • The challenge for the BTeV trigger and data acquisition system is to reconstruct particle tracks and interaction vertices in every beam crossing. Looking for topological evidence of B (or D) decay. • This is feasible for BTeV detector and trigger system, because of – – – – Pixel detector – low occupancy, excellent spatial resolution, fast readout Heavily pipelined and parallel architecture (~5000 processors) Sufficient memory to buffer events while awaiting trigger decision Rapid development in technology – FPGAs, processors, networking • 3 Levels: – L1 Vertex trigger (pixels only) + L1 Muon trigger – L2 Vertex trigger – refined tracking and vertexing – L3 Full event reconstruction, data compression BTeV detector p p 30 station Si pixel detector 14,080 pixels (128 rows x 110 cols) 50 mm Multichip module 1 cm 400 mm Si pixel sensors 128 rows x 22 columns 5 cm 5 FPIX ROC’s L1 vertex trigger algorithm • Segment Finder (pattern Recognition) – Find beginning and ending segments of tracks from hit clusters in 3 adjacent stations (triplets): • beginning segments: required to originate from beam region • ending segments: required to project out of pixel detector volume • Tracking and Vertex Finding. – Match beginning and ending segments found by FPGA segment finder to form complete tracks. – Reconstruct primary interaction vertices using complete tracks with pT<1.2GeV/c. – Find tracks that are “detached” from reconstructed primaries. • Trigger Decision – Generate Level-1 accept if it has two “detached” tracks going into the instrumented arm of the BTeV detector. B-meson p p b BTeV trigger overview BTeV detector Front-end electronics 7 > 2 x 10 channels m PIX 2.5 MHz 500 GB/s Level-1 L1 muon (200KB/event) Global Level-1 L1 vertex Level-1 Buffers Information Transfer Control Hardware L1 rate reduction: ~50x GL1 accept ITCH Req. data for crossing #N Crossing #N 50 KHz 12.5 GB/s (250KB/event) Level 2/3 Crossing Switch RDY #1 L2/3 rate reduction: ~20x Level-2/3 Processor Farm #2 #m-1 #m Level-2/3 Buffers Level-3 accept Data Logging 2.5 KHz 200 MB/s (250KB / 3.125 = 80KB/event) Level 1 vertex trigger architecture 30 Pixel Stations Pixel Pre-processors FPGA Segment Finders Switch (sort by crossing number) ~2500-node track/vertex farm MERGE To Global Level-1 (GL1) Pixel Preprocessor Collision Hall Counting Room to neighboring FPGA segment finder Pixel stations Data combiners Pixel processor Pixel processor DCB DCB Optical links Pixel processor Pixel processor DCB FPGA segment finder sync (1bit) Row (7bits) Column (5bits) BCO (8bits) ADC (3bits) Optical Receiver Interface Time Stamp Expansion Event sorted by Time and column Level 1 Buffer Interface Hit cluster finder & x-y coordinate translator FPIX2 Read-out chip The Segment Tracker Architecture Station N Bend Station N-1 Bend Station 17 Station 17 Station 16 Station 16 Station 15 Station 15 Long doublets Station N+1 Bend Long doublet projections Bend view Nonbend view Triplets Station N-1 nonbend Triplets projection Triplets projection Station N nonbend Triplets projection N-1 Short doublets 12 half Pixel planes at 12 different Z locations. N Short doublets Short doublet outputs MUX BB33 outputs N+1 Short doublets Station N+1 nonbend • Find interior and exterior track segments in parallel in FPGAs. • The Segment finder algorithm is implemented in VHDL L1 Track and Vertex Farm • Original baseline of L1 track and vertex farm used custom made processor board based on DSP or other processors. Total processors estimated to be 2500 TI DSP 6711. The L1 switch is custom designed too. • After DOE CD1 review, BTeV changed L1 baseline design. – L1 Switch, Commercial off-the-shell Infiniband switch (or equivalent). – L1 Farm, array of commodity general processors, Apple G5 Xserves (or equivalent). Level 1 Trigger Architecture (New Baseline) 30 Pixel Stations Pixel Processors 33 “8GHz” Apple Xserve G5’s with dual IBM970’s FPGA Segment Finders (56 per highway) Level 1 switch 33 outputs at ~76 MB/s each 56 inputs at ~45 MB/s each Infiniband switch Trk/Vtx node #1 Trk/Vtx node #2 Ethernet network Trk/Vtx node #N Track/Vertex Farm Level 1 Buffer Global Level 1 Apple Xserve identical to track/vertex nodes PTSM network 1 Highway R&D projects • Software development for DSP Pre-prototype. • Level 1 trigger algorithm processing time studies on various processors. – Part of trigger system R&D for a custom-made Level 1 trigger computing farm. • StarFabric Switch test and bandwidth measurement. – R&D for new Level 1 Trigger system baseline design. – After DOE CD1 review, BTeV collaboration decided to change baseline design of Level 1 trigger system. • L1 Switch – replace custom switch with Infiniband switch(or equivalent). • L1 Farm – replace DSP hardware with Apple G5 Xserves (or equivalent). • Pixel Preprocessor of Level 1 trigger system. – Clustering algorithm and firmware development. DSP Pre-prototype main goals • Investigate current DSP hardware and software to determine technical choices for baseline design. • Study I/O data flow strategies. • Study Control and Monitoring techniques. • Study FPGA firmware algorithms and simulation tools. – Understand major blocks needed. – Estimate logic size and achievable data bandwidths. • Measure internal data transfers rates, latencies, and software overheads between processing nodes. • Provide a platform to run DSP fault tolerant routines. • Provide a platform to Run Trigger algorithms. Features of DSP Pre-prototype Board • Four DSP mezzanine cards on the board. This can test different Different TI DSPs for comparision. • The FPGA Data I/O Manager provides two way data buffering. It Communicates the PCI Test Adapter (PTA ) card to each DSP. • Two Arcnet Network ports. – Port I is the PTSM (Pixel Trigger Supervise Monitor). – Port II is the Global Level 1 result port. – Each Network port is managed by a Hitachi microcontroller. – PTSM microcontroller communicates to the DSPs via DSP Host Interface to generate initialization and commands. – GL1 microcontroller receives trigger results via DSP’s Buffered Serial Port (BSP). • Compact Flash Card to store DSP software and parameters. • Multiple JTAG ports for debugging and initial startup. • Operator LEDs. L1 trigger 4-DSP prototype board DSP Pr o t o t y p e Bo a r d 9 / 0 1 DSP Input Buffer PCI Test Adapter Input Buffer Control Manager DSP RAM FPGA ROM LVDS Link Interface LVDSInterface LVDS C:\> ROM RAM DSP Output Buffer RAM ROM C:\> Output Buffer Control Manager DSP To Global Level-1 ROM RAM C:\> ArcNet Interface FPGA McBSP (for trigger decisions) Hitachi H8 mcontrollers ArcNet Interface Host Port Interface FPGA Pixel Trigger Supervisor Monitor (PTSM) FLASH RAM JTAG Level 1 Pixel Trigger Test Stand for the DSP pre-prototype Xilinx programming cable ARCnet card PTA+PMC card DSP daughter card TI DSP JTAG emulator DSP Pre-prototype Software(1) • PTSM task on the Hitachi PTSM microcontroller. – – – – – System initialization. Kernel and DSP application downloading. Command parsing and distribution to subsystems. Error handling and reporting. Hardware and software status reporting. Diagnostics and testing functions. • GL1 task on the Hitachi GL1 microcontroller. – Receives the trigger results from the DSP’s and send to the GL1 host computer. • Hitachi Microcontroller API. A library of low level C routines have been developed to support many low level functions. – ArcNet network driver. – – – – – – Compact Flash API. Support FAT16 file system. LCD API. Display messages on the on-board LCD. Serial Port API: JTAG API One Wire API DSP Interface API. Boot and reset DSP’s; access memory and registers on the DSP’s. DSP Pre-prototype Software(2) • Host computer software. – PTSM Menu-driven interface. – GL1 message receiving and displaying. • Custom defined protocol built on the lowest level of ArcNet network driver. Most efficient without standard protocol overhead. Processor evaluation • • • We continued to measure Level 1 trigger algorithm processing time on various new processors. MIPS RM9000x2 processor. Jaguar-ATX evaluation board. – Time studies on Linux 2.4 – Time studies on standalone. Compiler MIPS SDE Lite 5.03.06. – System (Linux) overhead for processing time is about 14%. PowerPC 7447 (G4) and PowerPC 8540 PowerQuiccIII (G5). – GDA Tech PMC8540 eval card and Motorola Sandpoint eval board with PMC7447A. – Green Hills Multi2000 IDE with Green Hills probe for standalone testing. 8540 eval board Green Hills Probe Candidate processors for Level 1 Farm Processor TI TMS320C6711 (baseline) L1 algorithm processing time 1,571 us provided for comparison Motorola 8540 PQIII PPC 271 us (660MHz, GHS MULTI 2K 4.01) Motorola 7447A G4 PPC 121 us (1.4GHz, GHS MULTI 2K 4.01) Motorola 74xx G4 PPC 195 us (1GHz 7455, Apple PowerMac G4) PMC Sierra MIPS RM9000x2 341 us (600MHz, MIPS SDE Lite 5.03.06) Intel Pentium 4/Xeon 117 us (2.4 GHz Xeon) IBM 970 PPC 74 us (2.0 GHz Apple PowerMac G5) suited for an off-the-shelf solution using desktop PC (or G5 server) for computing farm. StarFabric Switch Testing and Bandwidth Measurement • • • • In the new baseline design of BTeV Level 1 trigger system, the commercial, offthe-shelf switch will be used for the event builder. Two commercial switch technology are tested, Infiniband (by Fermilab) and StarFabric (by IIT group with Fermilab). Hardware setup for StarFabric switch testing. – PC with PCI bus 32/33. – StarFabric adapter, StarGen 2010. – StarFabric switch, StarGen 1010. Software – StarFabric windows driver. P4/W2k Test Stand Athlon/XP SG2010 SG1010 SG2010 L1 Switch Bandwidth Measurement • • • StarFabric bandwitdh is between 74~84 Mbytes/s for packet size of 1 kByte to 8 kBytes. This result can not meet the bandwidth requirement of event builder. A simple way to improve performance is to use PCIx(32/66 or 64/66) . Infiniband test stand uses PCI-X adapters in input/output computer nodes. Based on this result and other consideration, Infiniband is chosen in the new baseline design of the Level 1 trigger system. But, we are still looking at StarFabric and other possible switch fabric. Infiniband 167 MB/s Bandwidth Target At peak luminosity (<6> ints./BCO), with 50% excess capacity StarFabric Pixel Preprocessor Pixel Detector Front-end 30 station pixel detector Optical Receiver Interface PP&ST Segment Tracker Nodes Time Stamp Expansion Level 1 switch Infiniband switch 33 outputs at ~76 MB/s each 56 inputs at ~ 45 MB/s each Event sorted by Time and culomn DAQ Level 1 Buffer Interface Hit cluster finder & x-y coordinate translator Segment Trackers Row and Column Clustering Pixel Chip • • • • A track can hit more than one pixel due to charge sharing. One function of pixel Preprocessor is to find adjacent pixel hits, group them as a cluster and calculate x-y coordinates of cluster. Adjacent hits in the same row form a row cluster. Two overlapping row clusters in adjacent columns form a cross column cluster. Cluster Finder Block Diagram • Hit Input FIFO Hash Sorter: Column Ordering • • Row Cluster Processor: Cross-Row Clusters Col N-1 Col N Cross-Column Processor Cross-Col. Clusters • • Cluster Parameters Calculator Cluster The order of input hits in a row is defined. However, the column order is not. The hash sorter is used to produce defined column order. The row cluster processor identifies adjacent hits in a row and pass the starting/ending row numbers to next stage. The cross-column processor groups overlap hits (or clusters) in adjacent columns together. Cluster parameters are calculated in the cluster parameter calculator. Implementation for Cross-Column Cluster Finder Col. A FIFO2 Hits Col. B Cross-column headers Hits State Control Cross-row headers The two clusters form a crosscolumn one and are popped out. FIFO1 The cluster in Col. A is a single column one and is popped out. If Col. B is not next to the Col. A, entire Col. A is popped out. The cluster in Col. B is not connected with Col. A and is filled into FIFO2. Col. A Col. B Col. A Col. B Implementation for Cross-Column Cluster Finder (cont’d) Col. A Fill Col. A FIFO2 Col. B = Col. A +1 ? Col. B N Y State Control uAN< uB1 FIFO1 (1) uAN< uB1 (2) uA1 > uBN Pop Col. A Fill Col. B uA1 > uBN Neither Pop A • Pop A Pop B Fill B The cross-column cluster finder firmware is written with VHDL. BES-II DAQ System BES experiment upgraded its detector and DAQ system in 1997. Beijing Spectrometer Performance of BES-II and BES-I Subsystem Variable BES-II BES-I MDC P/P 1.78%(1+P2)1/2 1.76%(1+P2)1/2 xy 198-224 mm 200-250 mm dE/dx 8.0% 7.8% VC xy 90mm 220mm TOF T 180ps 375ps SC E/E 21%E -1/2 24.4%E -1/2 MUON Z 7.9 cm Layer 1 10.6 cm Layer 2 13.2 cm Layer 3 DAQ Dead Time 10ms 20ms BES-II DAQ System • BES Control ALPHA 3600 Server OPEN VMS • Ethernet • VME 167 HOST VME 162 Target 1 VME Memery VME 162 Target 2 SCSI BUS DISK 8mm Tape VME BUS VMEBus Repeater VMEBus Repeater • VME BUS VCBD 0 VCBD1 VCBD2 VCBD3 VCBD4 VCBD5 VCBD6 VCBD7 VCBD8 1131 C MDC-Q A MDC-T TRG M TOF MUON A BSC ESC C LUM VC 1879 H.V. 1821 TDC FASTBUS • Front-end electronics for all of system, except VC, consist of CAMAC BADC (Brilliant ADC). VCBD, VME CAMAC Branch Driver. Read data of one detector subsystem. And store the data in the local buffer. Two VME CPU modules with RT OS VMEexec. – One for data acquisition and event building. – Another one for event logging to tape and sending a fraction of events to Alpha 3600. DEC Alpha 3600 machine. – DAQ control console. – Status/error report. – Online data analysis and display. – Communication with BEPC control machines to obtain BEPC status parameters. The system dead time: 10 ms. – – BADC conversion: 6ms. VCBD readout: 3ms. Fastbus subsystem for Vertex Chamber ECL INPUT 1879 TDC 1879 TDC FASTbus SIB 1821 SM/I PC 1131 logical board common stop good event reset VME bus VME 162 VME 162 • One Fastbus crate for 640 VC channels. • Fastbus logical board. – Distributes every kind of signals to TDCs, common stop, reset (fast clear). – Produce internal start and stop test pulses. – Good event signal tells the 1821 to read data from 1879. Microcode for the 1821 • • • • • Initialization for 1879. – TDC scale: 1 us. – Compact parameter: 10 ns. – Active Time Interval: 512 bins. Readout 1879 data into data memory of 1821. – Block transfer. – Sparse data scan method. TDC modules containing data are readout only. Send data ready signal (interrupt) to VME. SONIC language. Symbolic Macro Assembler. Converted to microcode under LIFT. LIFT (LeCroy Interactive Fastbus Toolkit). Tool for developing microcodes and testing FB system under PC. VC DAQ Software in VME • • • • • • • A task running in VME 162. Control by BES-II DAQ main task through message queues. Down loading the microcode into 1821. Control the procedure of VC data taking. Readout time data from 1821 into 1131 data memory after receiving interrupt signal. Data transfer modes: – High 16-bit: DMA. – Low 16-bit: word by word. Measured transfer rate. – 96(chans)x7(modules)x2(both edges)+3(marks) = 1347 32-bit words. – High 16-bit: DMA: 1.1 ms @VME 162. – Low 16-bit: word by word: 3.5 ms@VME 162. End The End Backup slides Backups BTeV trigger architecture Pixel Processors Level-1 Buffers FPGA Segment Finder Data Combiners + Optical Transmitters Gigabit Ethernet Switch Track/Vertex Farm Front End Boards Optical Receivers Global GL1 Level-1 Level 2/3 Processor Farm Information Transfer Control Hardware ITCH 12 x 24-port Fast Ethernet Switches BTeV Detector 8 Data Highways Cross Connect Switch Data Logger L1 Highway Bandwidth Estimates Bandwidth estimates are for 6 interactions/crossing & include 50 % excess capacity Segment Tracker Node Segment Tracker Node Segment Tracker Node ~15 L1 L1 L1 Buffers Buffers Buffers 167 MB/s Muons Total Triplets 2.5GB/s Front Ends Switching Fabric Bridge 1.8 MB/s + 0.5 MB/s 83 MB/s 15 MB/s Worker Node Worker Node ~30 Worker Node 300 KB/s GL1 + ITCH Node Results+Triplets 54 MB/s (1/50 rejection) Other: 1 GB/s DAQ Highway Switch DAQ Switch Results+Triplets 583 KB/s Other: 10MB/s (Raw Pixel Data) L2 Node L2 Node ~96 L2 Node