The Track-Finding Processor for the Level-1 Trigger of the CMS Endcap Muon System D. Acosta, A. Madorsky, B. Scurlock, S.M. Wang University of Florida A. Atamanchuk, V. Golovtsov, B. Razmyslovich, L. Uvarov St. Petersburg Nuclear Physics Institute Geometric Coverage ME4 descoped LHC Electronics Workshop 2001 2 Alex Madorsky Trigger Regions in ϕ ME1/3 Track-Finding performed in independent 60° sectors MB2/2 MB2/1 Illustration of overlap region LHC Electronics Workshop 2001 3 Alex Madorsky Muon Track-Finding Perform 3D track-finding from trigger primitives Measure PT, ϕ, and η Transmit highest PT candidates to Global L1 θ ϕ LHC Electronics Workshop 2001 4 Alex Madorsky Sector Partitioning 20 ° 20° 20 ° Sector Sector Sector ME1 Left 30° or 20° subsectors in ME1 ME1/3 ME1 Center 10 ° 10 ° 10° 60° Sector 10° 10° ME2/210 ° 10° 10° 10 ° 10 ° 10 ° ME3/2 10 ° 10 ° 10° 10 ° 10 ° 10 ° 10° 10 ° 10° 10 ° 10° 10° ME1/2 ME2/1 10° 10° 10° 10° 10° 10° ME1/1 ME1/A 20° 20 ° 20° 20 ° ME3/1 20° 20° 10 ° 10° 10° 10 ° 10°10 ° 16µ 2 or 3 MPC 10° 60 ° Sector ME1 Right Muon Port Card 16µ Muon Port Card 2µ 6 µ / SR 2µ Sector Receiver Barrel LHC Electronics Workshop 2001 16µ 18µ Muon Port Card 18µ Muon Port Card Muon Port Card 2µ 3µ 3µ 6µ Sector Processor Barrel 6µ Sector Receiver Barrel 5 Alex Madorsky 60° sectors in ME2 —ME4 Sector Processor Functionality èIdentify and measure muons from ~ 600 bits every 25ns (3 GB/s) èPerform all possible station-to-station extrapolations in parallel p Simultaneously search roads in ϕ and η èAssemble 3- and 4-station tracks from 2-station extrapolations èCancel redundant short tracks if track is 3 or 4 stations in length èSelect the three best candidates èCalculate PT, ϕ, η and send to CSC muon sorter LHC Electronics Workshop 2001 6 Alex Madorsky Sector Processor Block Diagram Extrapolation Units 2–4 (18 bits) 3-4 3-2 TAU 2 3-1 - Control Line - Downloading/ Readout Line Track 4 2-4 2-3 2-1 3 (3x26) 1(6x26) TAU 1 EU2 1-3 LHC Electronics Workshop 2001 Track 3 Track 2 Track 1 1–3 (36 bits) 2 (3x26) 1(6x26) EU1 1-2 1–2 (36 bits) Assignment Unit Data Extraction LID MUX CLOCKED FIFO U – Extrapolation Unit AU – Track Assembling Unit SU – Final Selection Unit TA –Selected Track Address ID – Look-Up Input Data 2(3x26) 2–3 (18 bits) FSU Track 5 3 (3x26) EU3 2-3 Track 6 VME BUS EU4 2-4 Pt LUT Output Data 3x22=66 7 Alex Madorsky OUTPUT CONNECTOR - Data Line 2 (3x26) Track Assemblers Final Selection Unit STA Control 4 (3x26) Input Data 15x34=510 Data EU5 3 -4 DOWNLOADING/ READOUT INTERFACE 3 (3x26) INPUT DATA & CONTROL INTERFACE 9U CUSTOM BACKPLANE Input 8x3 4 (3x26) 3–4 (9 Extrapolations or 18 bits) Sector Processor Logic èPerform Chamber ME4 41 42 43 Chamber ME3 31 32 all combinations of extrapolations in parallel: 1i ↔ 2k , 1i ↔ 3k, 2i ↔3k 2i ↔ 4k , 3i ↔ 4k But not 1i ↔ 4k 33 èTrack Chamber ME2 21 22 23 Assembler takes best 2 or 3 extrapolations per reference segment Chamber ME1 * 1 * 1 * 1 1 * 1 2 3 1 ** 1 1 ** 2 1 ** 3 ** 1 LHC Electronics Workshop 2001 8 Alex Madorsky Extrapolation Unit η Road Finder: •Check if track segment is in allowed trigger region in η. •Check if ∆η and η bend angle are consistent with a track originating at the collision vertex. Quality Assignment Unit: •Assigns final quality of extrapolation by looking at output from η and φ road finders and the track segment quality. φ Road Finder: •Check if ∆φ is consistent with φ bend angle φ b measured at each station. •Check if ∆φ in allowed range for each η window. LHC Electronics Workshop 2001 9 Alex Madorsky Track Assembler Unit Functions: èDetermines if any track segment pairs belong to the same muon èCombines those segments and assigns a code to denote which muon stations are involved èOutputs a quality word for the best track for each hit in the key stations (2 and 3) LHC Electronics Workshop 2001 10 Alex Madorsky Final Selection Unit Functions: èSelect three best out of nine tracks presented by Track Assemblers èCancel redundant tracks (parts of the longer track) LHC Electronics Workshop 2001 11 Alex Madorsky Pt Assignment Unit PT, ϕ, η of the three best muons selected by Final Selection Unit. èCalculates LHC Electronics Workshop 2001 12 Alex Madorsky First Prototype LHC Electronics Workshop 2001 13 Alex Madorsky Custom LVDS Backplane LHC Electronics Workshop 2001 14 Alex Madorsky Test crate LHC Electronics Workshop 2001 15 Alex Madorsky First Prototype Results èTrack-Finder algorithm implemented in 15 Xilinx Virtex FPGAs, ranging from XCV50 to XCV400 èOne XCV50 for VME interface èOne XCV50 for output FIFO èLatency 15 clocks èOutput compared with C++ model, using simulated input data as well as random numbers, transmitted via custom backplane è100% matching demonstrated LHC Electronics Workshop 2001 16 Alex Madorsky Test software Written: èStandalone version of the C++ model for Windows èModule for the comparison of the C++ model with the board’s output èJTAG configuration routine, controlling the fast VME-to-JTAG module of the board èLookup configuration routine, used to write and check the on-board lookup memory èBoard Configuration Database with a Graphical User Interface (GUI), that keeps track of many configuration variants and provides a one-click selection of any one of them. Each variant contains the complete information for FPGA and lookup memory configuration. Can be used for multiple boards. To simplify possible porting: èAll software is written in portable C or C++ èGUI is written in JAVA LHC Electronics Workshop 2001 17 Alex Madorsky Test Software Screenshot LHC Electronics Workshop 2001 18 Alex Madorsky Single Crate Solution Muon Sorter SR SR SR SR SR SR / / / / / / SP SP SP SP SP SP CCB BIT3 Controller Clock and Control Board MS Track-Finder crate (1.6 Gbits/s optical links) SR SR SR SR SR SR / / / / / / SP SP SP SP SP SP SR/SP Card (3 Sector Receivers + Sector Processor) (60° sector) From MPC (chamber 4) From MPC (chamber 3) From MPC (chamber 2) From Trigger Timing Control From MPC (chamber 1B) From MPC (chamber 1A) To Global Trigger To DAQ • Power consumption: ~ 500W per crate • 15 optical connections per SR/SP card • Custom backplane for SR/ SPs < > CCB/MS connection LHC Electronics Workshop 2001 19 Alex Madorsky Merged SR/SP 15 trigger links, 1 DAQ Small Form Factor Deserializer Front Transceivers Chips FPGAs From trigger primitives 45 SRAM Memory Look-up Tables From MPC (chamber 4) ~45 SRAM chips 80 MHz speed VME Interface Sector Processor FPGA chip From MPC (chamber 3) From MPC (chamber 2) To Muon Sorter From MPC (chamber 1C) Ptassignment LUTs From MPC (chamber 1B) To/From Barrel From MPC (chamber 1A) To CSC DAQ 1 April 2001 Input: 528 bits/sector/b.x. LHC Electronics Workshop 2001 From Clock and Control Board Output: 60 bits/sector/b.x. 20 Alex Madorsky Salient Features Bi-Directional Optical Links èSince we have both transmitters and receivers in the optical connection, this allows the option to send data as well as receive è Makes testing much easier p One board sends data to other, or even itself through links SP chip on a mezzanine board è è Decouples board development from FPGA technology Makes upgrades easier DT fan-out on transition board Deliver all possibly needed signals to backplane connector è Settle transmission technology, connector type, connector count on transition board at a later date è LHC Electronics Workshop 2001 21 Alex Madorsky SR/SP Inputs è Muon Port Cards deliver 15 track stubs each BX via optical Signal Sent on first frame èDT Bits / stub ½ Strip * CLCT pattern * L/R bend * Quality * Wire group Accelerator µ CSC i.d. 8 4 Bits / 3 stubs (1 MPC) 24 12 Bits / 15 stubs (ME1–ME4) 120 75 1 3 7 1 4 3 9 21 3 12 15 45 105 15 60 BXN Valid pattern 2 1 6 3 30 15 Spare Total: 1 32 3 96 15 480 Description ½ strip label Pattern number without 4/6, 5/6, 6/6 Sign bit for pattern Computed by TMB Wire group label Straight wire pattern Chamber label in subsector 2 LSB of BXN Must be set for above to apply (240 bits at 80 MHz) Reduced from 5 Needed for frame info Track-Finder delivers 2 track stubs each BX via LVDS Signal Bits / stub φ φb Quality BXN Synch/Calib Muon Flag Total: 12 5 3 2 1 1 24 LHC Electronics Workshop 2001 Bits / 2 stubs (MB1: 60° ) 24 10 6 4 2 2 48 22 Description Azimuth coordinate φ bend angle Computed by TMB 2 LSB of BXN DT Special Mode 2nd muon of previous BX Alex Madorsky SR/SP Outputs 6 track stubs are delivered to DT Track-Finder each BX (delivered at 40 MHz to transition board) è è Signal Bits / stub 12 Bits / 2 stubs (ME1: 20° ) 24 Bits / 6 stubs (ME1: 60° ) 72 Bits / 6 stubs @ 80 MHz 36 φ η 1 2 6 3 Quality 3 6 18 9 BXN – 2 6 3 16 34 102 51 Description Azimuth coordinate DT/CSC region flag Computed by TMB 2 LSB of BXN Total 3 muons per SP are delivered to Muon Sorter via GTLP backplane Sent on first frame Signal Bits / µ φ η Rank * Sign * BXN * Error * Spare * Total: LHC Electronics Workshop 2001 5 5 7 Bits / 3 µ (1 SP) 15 15 21 Bits / 36 µ (12 SP) 180 180 252 1 – – 1 19 3 2 1 3 60 36 24 12 12 720 23 Description Azimuth coordinate Pseudorapidity 5 bits pT + 2 bits quality 2 LSB of BXN (360 bits at 80 MHz) Alex Madorsky SR/SP Internal Dataflow è ME1 only Data delivered from SR to SP Signal Bits / stub 12 5 6 1 1 Bits / 6 stubs (ME1) 72 30 36 6 6 Bits / 15 stubs (ME1–ME4) 180 75 90 15 6 φ φb η Accelerator µ Front/Rear * CSC ghost * – 4 4 Quality Total: 3 28 18 172 45 415 LHC Electronics Workshop 2001 24 Description Azimuth coordinate φ bend angle Pseudorapidity η bend angle ME1 chamber stagger 2 stubs in same ME1 CSC Computed by TMB (sent at 80 MHz) Alex Madorsky Old SR Memory Scheme Now in TMB LHC Electronics Workshop 2001 25 Alex Madorsky New SR Memory Scheme 0.5 BX ½ Strip 8 Pattern 4 Quality 3 L/R Bend 1 Muon # Local φ LUT 128K× 16 0.5 BX φlocal 10 10 φlocal ALCTWG 5 6 φb, local CSCID Muon # 1 4 1 φlocal 10 Frame 1 Frame 2 ALCTWG 5 CSCID Muon # 4 1 ALCTWG 7 CSCID 4 φb, local 6 LHC Electronics Workshop 2001 26 φlocal 2 Muon # 1 Global φ LUT (CSC) 1M×12 12 φCSC Global φ LUT (DT) 1M×12 12 φDT Global η LUT (CSC) 1M×11 6 ηCSC 5 φb,CSC Alex Madorsky Discussion of SR Memory (1) Data Frames: Data arrives off links @ 80 MHz è The proposed framing scheme implies no latency loss p First LUT operates on first frame only (but must reduce CLCT pattern label by 1 bit) è This frame definition must be applied in the TMB where the data is generated and first serialized (i.e. before MPC) to avoid latency penalty è Memories: è è è LUT for DT only applies to ME1 Chip count is 3 per stub, down from 6 originally Increased memory size to 0.5M (okay for synch. SRAM) LHC Electronics Workshop 2001 27 Alex Madorsky Discussion of SR Memory (2) One memory set per stub: è è è 15×3 = 45 chips 1 BX latency 415 signals to SP First SR had 36 chips and 1 BX latency Memory choices: è è synch SRAM clocked at 160 MHz (tested to this frequency) Flow Through SRAM clocked at 80 MHz LHC Electronics Workshop 2001 28 Alex Madorsky Pt Assignment Memory Scheme SP FPGA 8 4 Sign 1 η 4 F/R 1 Mode 4 Muon 1 Rank LUT 4M×8 Rank1 7 OE ∆φ12 ∆φ23 8 4 Sign η F/R Mode 1 4 1 4 Muon 2 Rank LUT 4M×8 Rank2 7 OE ∆φ12 ∆φ23 Must serialize to fit data on backplane è No latency penalty for serialization if Rank sent first to Muon Sorter è SP FPGA contains tri-state buffers and generates enable for memories è Frame 1 8 4 Sign 1 η 4 F/R 1 Mode 4 80 MHz Muon 3 Rank LUT 4M×8 Rank3 7 OE 80 MHz Sign1 , Sign2, Sign 3, BXN, Error φ1 , φ2 , φ3 η1 , η2 , η3 LHC Electronics Workshop 2001 9 30 15 Register/GTLP converter ∆φ12 ∆φ23 To GTLP Backplane GTLP backplane Frame 2 15 29 Alex Madorsky CSC TF DAQ Plans Send input of SR and output of SP to DAQ stream for diagnostics Track-Finder DAQ acts as additional DDU for EMU system è Plan to use existing DDU design by OSU p OSU has decided to use T.I. Chip for serialization p 12 SP fiber connections fits well into 15 planned for DDU LHC Electronics Workshop 2001 30 Alex Madorsky Mirror CSC DAQ Path OSU now plans 20° slices to equalize bandwidth CSC DDU designed by Ohio State Univ. 15 optical fibers × 36 Sector Processors send L1 data LHC Electronics Workshop 2001 12 optical fibers DDU 31 SLINK Alex Madorsky + 1 (2?) First prototype data flow Pre-production prototype data flow From Muon Port Cards From Muon Port Cards 1 Lookup tables Sector Receiver st.4 Front FPGAs Sector Receiver st.2,3 1 Optical receivers Sector Receiver st.1 Optical receivers 1 Front FPGAs 1 Lookup tables SR/SP board Channel link transmitters 0 Bunch crossing analyzer (not implemented) 4 1 2 Bunch crossing analyzer (not implemented) 3 Extrapolation units Latency Latency Channel link receivers 1 3 Final selection unit 3 best out of 9 3 2 Total: 21 Pt precalculation for best 3 muons Pt assignment (memory) Sector Processor 9 Track Assembler units (memory) 1 1 Total: 7 To Muon Sorter LHC Electronics Workshop 2001 32 9 Track Assembler units Sector Processor FPGA 1 2 Extrapolation units Pt precalculation for 9 muons Final selection unit 3 best out of 9 Output multiplexor Pt assignment (memory) To Muon Sorter Alex Madorsky Pre-Production Prototype Done: èSP algorithm is rewritten in Verilog HDL èNew C++ model line-by-line corresponds to Verilog HDL code èIf Verilog HDL code is written properly, C++ conversion is very simple and straightforward èNew C++ model matches the functionality of the original C++ model èSP-Verilog has been extensively simulated and is working in exact correspondence to functionality of the first prototype and both C++ models èSimulated latency for Sector Processor Algorithm: 5 clocks (15 in the first prototype) èBoard development underway Plans: èReplacing the original C++ model with the new one in ORCA èAdding new features into the SP-Verilog code and C++ model simultaneously, keeping them in strict line-by-line correspondence LHC Electronics Workshop 2001 33 Alex Madorsky SP Algorithm Modifications èExtrapolation and Final Selection units are reworked, and now each of them is completed in only one clock. èThe Track Assembler Units in the first prototype were implemented as external lookup tables (static memory). For the second prototype, they are implemented as FPGA logic. This saved I/O pins on the FPGA and one clock of the latency. èThe preliminary calculations for the PT assignment are done in parallel with final selection for all 9 muons, so when three best out of nine muons are selected, the pre-calculated values are immediately sent to the external PT assignment lookup tables. LHC Electronics Workshop 2001 34 Alex Madorsky Acknowledgements This work was supported by grants from the US Department of Energy. We also would like to acknowledge the efforts of R. Cousins, J. Hauser, J. Mumford, V. Sedov, B. Tannenbaum, who developed the first Sector Receiver prototype, and the efforts of M. Matveev and P. Padley, who developed the Muon Port Card and Clock and Control Board, which were used in the tests. LHC Electronics Workshop 2001 35 Alex Madorsky