Track-Finding Processor for the Endcap Muon System D.Acosta University of Florida Muon Track-Finding • Link trigger primitives (stubs) into tracks • Assign PT, ϕ, and η • Send highest PT candidates to Global L1 trigger θ ϕ 9/30/98 D. Acosta 2 Muon Trigger Scheme BX 36 BX 42 BX 60 BX 80 9/30/98 D. Acosta 3 CSC Muon Trigger ≤3 9/30/98 D. Acosta 4 CSC Sector Layout 9/30/98 D. Acosta 5 Trigger Regions in η 9/30/98 D. Acosta 6 Track-Finding in Overlap Region Vienna/Bologna Proposal • Data path: ME1/3 → DT T-F – special link from ME1/3 motherboard, or from Sector Receiver card. • No modification to DT T-F hardware – ME1/3 becomes MB4 neighbor in η – inputs already there • PT measurement determined mostly from MB1/MB2 which are barrel chambers • Simplifies CSC T-F: – – – – – Reduced Sector Processor logic and I/O, Less processor units required Reduced I/O for Sector Receiver cards (no fan-out) Fewer CSC muons to sort No CSC crate interconnections • Saves some CSC latency 9/30/98 D. Acosta 7 Track-Finding in Overlap Region U.S. Proposal • Data path: MB1 + MB2 → CSC T-F – special DT Sector Receiver card or direct connection to processor • CSC Trigger Motherboard design unchanged • Uses η information for PT determination – B-field changes by only ±5% though • Full 3D Track-Finding in η and ϕ – reduces fakes • Greater redundancy with ME2 in case ME1/3 misses hit – Large (~25%) acceptance loss from cracks between ME1/3 chambers Keep option to cover overlap region with CSC T-F open. Design Sector Processor which can operate in either CSC or overlap crate. Offers complementary approach to difficult region. 9/30/98 D. Acosta 8 Sector Receiver Functionality • Receives 6 stubs via optical links from 2 Port Cards • Synchronizes the data – No additional synchronization performed by the Sector Processor • Reformats the data – LCT bit pattern → η, ϕ, Ψ • Applies alignment corrections to ϕ coordinate – Depends on ϕ and η • Communicates to Sector Processor via point-to-point backplane or connector • Fan out signals to neighboring processors if U.S. handles overlap region or ME1/3 data is sent to DT Sector Processor 9/30/98 D. Acosta 9 Required Precision of Data • Azimuthal angle ϕ: – LCT resolution is 0.1 strip – 12 bits / 60° ⇒ 1 bit / 0.26 mrad (0.1 strip) • Bend angle Ψ: Ψ ϕ1 – Range is ±45° with 0.5 strip minimum bend – 6 bits / ±45° ⇒ 1 bit / 60 mrad – Projection of Ψmin ⇒ ∆ϕ ~ 10 mrad ∆ϕ • ∆ϕ = ϕ2 - ϕ1 : – Maximum deflection is <15° – 11 bits full precision (12 bits - 2 bits + sign) – 6 bits / 15° ⇒ 1 bit / 8 mrad • Polar angle η: – Range is 2.4 - 0.9 = 1.5 units – B-field variation < ±4% for 0.05 unit bins – 5 bits / 1.5 units ⇒ 1 bit / 0.05 Quantity DT CSC T-F PT ϕ ∆ϕ ψ η Quality 12 bits 13 bits 8 bits 8 bits* 3 bits 12 bits 13 bits 6 bits 11 bits 3 bits 7 bits 6 bits 6 bits 5 bits 3 bits 12 bits 11 bits 6 bits 5 bits — 9/30/98 D. Acosta ϕ2 10 Track Segments per Sector Region Station CSC OVL 9/30/98 1 2 3 4 1 2 3 4 ExtrapChamber Segments No. of ϕ No. of per sector sectors segments olations ME1 ME2 ME3 ME4 MB1 MB2 ME1 ME2 3 3 3 3 2 2 3 3 D. Acosta 2 1 1 1 2 2 2 1 6 3 3 3 4 4 6 3 81 106 11 Inputs to CSC Sector Processor • 1 CSC stub = 12 ϕ bits + 6 Ψ bits + 5 η bits + 3 Q bits = 26 bits • 1 Port Card sends 3 stubs • 1 Sector Receiver accepts 2 Port Cards = 6 stubs • 1 Sector Processor accepts 6 + 3 + 3 + 3 = 15 stubs (divided between 2.5 Sector Receivers) • 15 stubs × 26 bits = 390 bits CSC crate: S R C S C S R C S C ME 1/1 ME 2,3 SP 156 156 9/30/98 78 S R C S C S R C S C S R C S C ME 4 ME 1/1 ME 2,3 SP 156 156 D. Acosta 78 S R C S C S R C S C S R C S C ME 4 ME 1/1 ME 2,3 S R C S C SP ME 4 156 78 156 12 Inputs to OVL Sector Processor • 1 CSC stub = 26 bits • 1 DT stub = 11 ϕ bits + 8 Ψ bits + 3 Q bits = 22 bits • 1 Sector Processor accepts 4+4 DT stubs and 6+3 CSC stubs • (8 DT stubs × 22 bits) + (9 CSC stubs × 26 bits) = 410 bits OVL crate: S R C S C S R C S C ME 1/1 ME 2 SP 78 156 9/30/98 176 S R D T S R C S C S R C S C MB 1,2 ME 1/1 ME 2 SP 78 156 D. Acosta 176 S R D T S R C S C S R C S C MB 1,2 ME 1/1 ME 2 S R D T SP MB 1,2 78 176 156 13 Track-Finding Algorithm • Sector Processor must identify muons from ~400 input bits • Follow approach of Vienna design: 1. Perform all possible station-to-station extrapolations in parallel 2. Assemble 3- and 4-station tracks from 2-station extrapolations 3. Cancel redundant tracks if 3 or 4 stations in length 4. Select the three best candidates 5. Calculate PT, ϕ, η and send to CSC muon sorter: Quantity η ϕ Muon sign PT Quality 9/30/98 Precision 6 bits 8 bits 1 bit 5 bits (nonlinear) 2 bits D. Acosta 14 Vienna Approach to Track Finding Generalize scheme to include η dependence in endcap and overlap regions for matching and Pt-assignment 9/30/98 D. Acosta 15 Extrapolation Logic U5 ETA_A_IN[4:0] ETA_B_IN[4:0] U9 ETAA[4:0] ETAA[4:0] ETA1_OUT[4:0] ETA_IN[4:0] ETAB[4:0] CORR_OUT[1:0] DPHI_IN[5:0] MATCH CLOCK DETA Unit CLOCKIN DETA ETA-DPHI Correlation Unit U2 PHI_A_IN[6:0] PHIAIN[6:0] PHI_B_IN[6:0] PHIBIN[6:0] PHIOUTA[5:0] CLOCKIN MATCH U4 DPHI Unit DPHI DPHIIN[5:0] PT_IN[1:0] U3 PSI_A_IN[5:0] PSIAOUTP[5:0] PSIAINP[5:0] PSIAOUTM[5:0] PSIAINM[5:0] MATCH_ETA U12 PSIAIN[5:0] MATCH_PHI COMPOUT PSI_B_IN[5:0] PSIBOUTP[5:0] PSIBINP[5:0] PSIBOUTM[5:0] PSIBINM[5:0] PSI Translation Unit PSIUNIT CLOCKIN PSIBIN[5:0] EXTRAP_OUT[1:0] EXTRAP_OUT[1:0] MATCH_PSI QUALITY[1:0] CLOCKIN CLOCKIN Quality Assignment DPHI-PSI Comparator DPHI-PSICOMPARATOR U10 Q_A_IN[2:0] Q_B_IN[2:0] Q_IN_A[2:0] Q_OUT[1:0] Q_IN_B[2:0] CLOCKIN Quality Comparison Unit 2 bits out CLOCKIN 2 × 21 bits in 9/25/98 Extrapolation Unit D. Acosta University of Florida Project: EXTRAPX1 Department of Physics Sheet: ~sch3022 Acosta Date: 09/15/98 16 ∆η Calculation Unit ETAA0 ETA1_OUT0 BUF BUF BUF BUF ETAA1 ETAA2 ETAA3 ETAA4 ETA1_OUT1 ETA1_OUT2 ETA1_OUT3 ETA1_OUT4 BUF ETA1_OUT[4:0] L3 L2 L9 EC[4:0] L1 EC[4:0] ETAA[4:0] DO RIN[5:0] A[4:0] + RIN[4:0] SUM[4:0] CLOCK L6 BUS_WIDTH=5 DATA_REG ETAB[4:0] A[5:0] ROM ED[4:0]B[4:0] - CLOCK C_OUT RIN5 INV ADD_SUB EC[4:0] L4 BUS_WIDTH=5 DATA_REG FD DO CLOCK D Q MATCH A[4:0] AND3 C ROM L5 BUS_WIDTH=5 DATA_REG FD ED[4:0] DO D Q A[4:0] C • Determines if η1 and η2 match ROM • Outputs η1 CLOCK BUF DETA Unit 9/25/98 D. Acosta University of Florida Project: EXTRAPX1 Department of Physics Macro: DETA Acosta Date: 09/15/98 17 ∆ϕ Calculation Unit • Calculates ϕ difference • Limits difference to <15° L21 PHIOUT[5:0] PHIOUTA[5:0] L18 S[6:0] S0 PHIOUT0 BUF BUF BUF BUF S1 PHIAIN[6:0] S2 S3 L3 S4 PHIOUT3 BUS_WIDTH=6 DATA_REG PHIOUT4 S6 BUS_WIDTH=7 DATA_REG CLOCK PHIOUT2 BUF S5 CLOCK PHIOUT1 MATCH XOR2 + NOR2 - INV C_OUT XOR2 INV L19 BUS_WIDTH=7 ADD_SUB PHIOUT5 PHIBIN[6:0] CLOCK CLOCKIN BUF BUS_WIDTH=7 DATA_REG DPHI Unit 9/25/98 D. Acosta University of Florida Project: EXTRAPX1 Department of Physics Macro: DPHI Acosta Date: 09/15/98 18 Ψ Translation Unit L13 L17 PSIAOUTP[5:0] DO[5:0] A[5:0] L9 BUS_WIDTH=6 L14 ROM PA[5:0] PSIAIN[5:0] PA[5:0] DO[5:0] A[5:0] CLOCK L18 BUS_WIDTH=6 DATA_REG PSIAOUTM[5:0] CLOCK BUS_WIDTH=6 DATA_REG BUS_WIDTH=6 ROM CLOCK BUS_WIDTH=6 DATA_REG L15 PB[5:0] L19 PSIBOUTP[5:0] DO[5:0] A[5:0] L10 PB[5:0] PSIBIN[5:0] BUS_WIDTH=6 ROM L16 DO[5:0] CLOCK CLOCK L20 BUS_WIDTH=6 DATA_REG BUS_WIDTH=6 DATA_REG BUS_WIDTH=6 ROM CLOCK BUS_WIDTH=6 DATA_REG CLOCKIN BUF Translates bend angles Ψ1 and Ψ2 into ranges in ∆ϕ 9/25/98 PSIBOUTM[5:0] A[5:0] D. Acosta PSI Translation Unit University of Florida Project: EXTRAPX1 Department of Physics Macro: PSIUNIT Acosta Date: 09/15/98 19 Quality Comparison Unit Q_IN_A0 Q_IN_A1 Q_IN_A2 Q_IN_A[2:0] Q_IN0 BUF Q_IN1 BUF Q_IN2 Q_IN[2:0] BUF L1 L2 L21 Q_IN[5:0] DO[1:0] RO[1:0] Q_OUT[1:0] A[5:0] Q_IN[5:3] Q_IN_B[2:0] Q_IN_B0 Q_IN_B1 Q_IN_B2 Q_IN[5:0] Q_IN3 BUF BUF BUF BUS_WIDTH=2 ROM CLOCK Q_IN4 CLOCK Q_IN5 BUS_WIDTH=6 DATA_REG BUS_WIDTH=2 DATA_REG CLOCKIN BUF Applies any quality cuts and assigns combined quality 9/25/98 D. Acosta Quality Comparison Unit University of Florida Project: EXTRAPX1 Department of Physics Macro: QUALCOMPAR Acosta Date: 09/15/98 20 η , ∆ϕ Correlation Unit L11 L12 D[5:0] DPHI_IN[5:0] • Limits ∆ϕ vs. η L1 PIN[5:0] A_LT_B CLOCK CLOCK PIN[5:0] BUS_WIDTH=6 DATA_REG BUS_WIDTH=6 DATA_REG L4 L7 • Assigns 2-bit PT word BUS_WIDTH=6 COMPARE L13 EIN[4:0] ETA_IN[4:0] DO[5:0] RA[5:0] A[4:0] CLOCK BUS_WIDTH=5 DATA_REG ROM EIN[4:0] CLOCK BUS_WIDTH=6 DATA_REG L5 EIN[4:0] L8 DO[5:0] RB[5:0] L2 U10 PIN[5:0] ENCODE_A_IN A_LT_B ENCODE_B_IN CORR_O A[4:0] ENCODE_C_IN ENCODE_MSB_OUT ROM CLOCK L9 BUS_WIDTH=6 DATA_REG L6 ENCODE_LSB_OUT Priority Encoder PRIENCODER CORR_OUT1 CORR_OUT0 BUS_WIDTH=6 L3 COMPARE RC[5:0] DO[5:0] A_LT_B A[4:0] ROM BUF CLOCKIN CLOCK BUS_WIDTH=6 DATA_REG BUS_WIDTH=6 COMPARE BUF 9/25/98 ETA-DPHI Correlation Unit D. Acosta University of Florida Project: EXTRAPX1 Department of Physics Macro: ETA-DPHICORR Acosta Date: 09/15/98 21 Ψ, ∆ϕ Correlation Unit L1 DPHIIN[5:0] L2 A[5:0] INA[5:0] A_LT_B Checks for consistency between Ψ1, Ψ2 and ∆ϕ CLOCK BUS_WIDTH=6 DATA_REG L7 BUS_WIDTH=6 COMPARE PSIAINP[5:0] CLOCK A[5:0] A[5:0] BUS_WIDTH=6 DATA_REG L11 L3 INB[5:0] PSIAINM[5:0] A_GT_B COMPOUT AND4 CLOCK BUS_WIDTH=6 DATA_REG L12 L6 BUS_WIDTH=6 COMPARE A[5:0] INC[5:0] PSIBINP[5:0] A_LT_B CLOCK L9 BUS_WIDTH=6 COMPARE BUS_WIDTH=6 L13 DATA_REG IND[5:0] PSIBINM[5:0] A_GT_B CLOCKIN CLOCK BUF BUS_WIDTH=6 COMPARE BUS_WIDTH=6 DATA_REG DPHI-PSI Comparator University of Florida Project: EXTRAPX1 Department of Physics Macro: DPHI-PSICOMPARATOR Acosta 9/25/98 D. Acosta Date: 09/15/98 22 Quality Assignment Unit PT_IN0 PT_IN1 QUAL_IN3 BUF QUAL_IN4 BUF PT_IN[1:0] L13 L12 QUAL_IN[4:3] QUAL_IN[2:1] QUALITY[1:0] RO0 RO1 RO[1:0] CLOCK CLOCK BUS_WIDTH=2 DATA_REG L1 L11 O[1:0] QUAL_IN2 QUAL_IN[4:0] OO[4:0] EXTRAP_OUT[1:0] DO[1:0] A[4:0] QUAL_IN0 QUAL_IN[4:0] BUS_WIDTH=2 DATA_REG CLOCK ROM CLOCK FD D MATCH_ETA L9 QUAL_IN1 BUF BUF BUS_WIDTH=5 DATA_REG Q BUS_WIDTH=2 DATA_REG C FD D MATCH_PHI C FD Q D Q AND3 C FD D MATCH_PSI Q C CLOCKIN BUF Assigns internal 2-bit quality word for extrapolation result (0 = no match) 9/25/98 D. Acosta Quality Assignment University of Florida Project: EXTRAPX1 Department of Physics Macro: QUALASSIGNMENT Acosta Date: 09/15/98 23 FPGA Logic Size • One extrapolation unit (correlation of two stubs) takes approximately 200 logic blocks in Xilinx (~5000 gates) including all look-up memories • Current ideas on partitioning indicate that about 25 extrapolations must be done in one FPGA – – – – – Need about 250 I/O pins Need 125000 gates ⇒ Xilinx XC40125XV Largest FPGA available, highest cost Performance/cost trend is rapidly increasing, however State-of-the-art FPGAs may run logic at 80MHz, which would save latency • Extrapolation logic is density limited, not I/O limited 9/30/98 D. Acosta 24 Track Assembly Logic • The scheme to assemble 2-station extrapolations into longer tracks is under study. • Presumably this step is similar to the Vienna design, except that additional cuts based on internal quality bits may be applied (such as PT consistency) 9/30/98 D. Acosta 25 PT Assignment ⇒ Size of single LUT is prohibitive ϕ1 12 ϕ2 12 η LUT RAM 5 PT 5 ⇒ 212+12+ 5 = 0.5 × 109 (× 5 bits) ⇒ Cascade logic: ϕ1 12 ∆ϕ ϕ2 12 12 ∆ϕ12 12 LUT RAM η 5 PT 5 ⇒ Arithmetic unit adds 1 b.x. latency, but could use fast cascaded SDRAM instead 9/30/98 D. Acosta 26 Features of CSC Sector Processor • Trigger logic is tunable – Content of memory LUT is programmable – Any correlation unit can be set to accept all inputs – For example, track-finding in η can be switched off • FPGA technology allows flexibility in logic design • No extra latency with respect to DT-SP in the extrapolation units to include η dependencies (just additional correlations in parallel) • Design should evolve into the same hardware for the overlap and endcap regions (semi-unified design) – Similar number of inputs – Complementary approach to overlap region 9/30/98 D. Acosta 27 Differences from Vienna Design • 3D vs. 2D Track-Finder – η information incorporated into extrapolation logic, rather than handled separately – Stronger rejection of fake tracks • PT assignment uses η information • No inputs from neighboring ϕ sectors or η wheels – Chambers project in ϕ, and Port Card collects all η • Less input to processor (400 signals vs. 800) even with η information • Design is limited by the size of the logic and look-up memories rather than by the I/O count 9/30/98 D. Acosta 28 CSC Sector Processor Limitations • Ghost capability not implemented – Extrapolation logic increases 3-fold provided we predetermine which two stubs come from the same chamber (more latency), otherwise 9-fold increase – Alignment corrections will be invalid – More internal bits for bookkeeping • Out-of-time station hits not implemented – Extrapolation logic increases by a factor 2N-1 for N b.x. (3-fold for 2 b.x.) – Straightforward to implement, no extra latency 9/30/98 D. Acosta 29 Muon Sorting • Latest Vienna/Bologna proposal requires two stages of muon sorting following the Sector Processor : 1. “Wedge sorter” reduces muon candidates from Sector Processors to best 4 2. Global Muon Trigger “sorter” collects 4 candidates each from DT, CSC, and RPC and send best 4 to Global L1 trigger • How many muons to sort in (1) for CSC-only? – 2 µ / 60° × 360° × 2 endcaps = 24 µ (same as for DT wedge sorter, use same hardware?) – 3 µ / 60° × 360° × 2 endcaps = 36 µ • How many muons to sort in (1) for CSC + OVL? – 2 µ / 60° × 360° × 2 endcaps × 2 crates = 48 µ – 3 µ / 60° × 360° × 2 endcaps × 2 crates = 72 µ • Possible to combine stages (1) and (2) ? – Might save latency • Bologna will design DT wedge sorter and Global Muon Trigger depending on funding • Who designs CSC sorter? How to fund? 9/30/98 D. Acosta 30 Future Work • Validate Track-Finder scheme with CMSIM study – – – – Check for holes in acceptance Calculate PT resolution Determine trigger rate Check efficiency of di-muon (and tri-muon) trigger • Tune parameters of Track-Finder – – – – – Quality assignment algorithms Number of internal quality bits Forward and backward extrapolation in ϕ Number of η bits for track-finding in η Sorting criteria • Start prototype design – Board layout – I/O connections (backplane design, choice of connectors) – VME Crate (most likely start with CDF VIPA crate) 9/30/98 D. Acosta 31