http://www.staff.uni-mainz.de/uschaefe/browsable/Meeting/2013/neu/ jFEX jFEX (inc. specific software/firmware & Tilecal input signal, options) 30' Mainz Logos ? Uli Schäfer 1 jFEX (inc. specific software/firmware & Tilecal input signal, options) 30' Assume we have to cover all that’s in the document, but more details, where we feel it helps. Did I forget to cover any important section - No physics justification • Jet processing general • L1Calo phase 1 scheme • Algorithms • Data replication • jFEX description #4 • Density, fibre count etc. • Input options #4 • Firmware (also software to be mentioned?) • • Demonstrators/current designs (GOLD, Topo, MiniPOD/V7) #4 Schedule, manpower req. jfex input HD+MZ Uli Schäfer 2 Jet processing Phase-0 jet system consisting of • Pre-Processor • Analogue signal conditioning • Digitization • Digital signal processing • Jet element pre-summation to 0.2 x 0.2 (η×φ) • Jet processor • Sliding window processor for jet finding • Jet multiplicity determination • Jet feature extraction into L1Topo (pre-phase1) At phase-1: complement with jet feature extractor jFEX • LAr signals optically from digital processor system • Tilecal signals from analogue Pre-Processor / JEP … • … eventually Tilecal optical data off detector, and possible retirement of current L1Calo system Uli Schäfer 3 L1Calo Phase-1 System From Digital Processing System ROD RTM Opt. Hub eFEX Plant ROD RTM Hub jFEX L1Topo JEM JMM CMX PPR CPM CMX New at Phase 1 Uli Schäfer 4 • Hier 3 optionen Uli Schäfer 5 Algorithms, now and then Phase 0 tower 0.2 x 0.2 Phase 1 0.1 x 0.1 ROI 0.4 x 0.4 three jet windows up to 0.8 × 0.8 0.9 × 0.9 limited by data duplication Uli Schäfer Sliding window algorithm : Find and disambiguate ROIs Calculate jet energy in differently sized windows (programmable) • Improve granularity by factor of four, to 0.1×0.1 (η×φ) • Slightly increase environment • Allow for flexibility in jet definition (non-square jet shape, Gaussian filter, …) • Fat jets to be calculated from high granularity small jets • Optionally increase jet environment (baseline 0.9 × 0.9) 6 Data replication Sliding window algorithm requiring large scale replication of data • Forward duplication only (fan-out), no re-transmission • Baseline: no replication of any source into more than two sinks • Fan-out in eta (or phi) handled at source only (DPS) • Duplication at the parallel end (on-FPGA), using additional Multi-Gigabit Transceivers • Allowing for differently composed streams • Minimizing latency • Fan-out in phi (or eta) handled at destination only • Baseline “far end PMA loopback” • Looking into details and alternatives Uli Schäfer 7 Initial baseline 8+ Modules, each covering full phi, limited eta range • Environment of 0.9 in eta (core bin +/- 4 neighbours) • Each module receives fully duplicated data in eta : 1.6 eta worth of data required for a core of 0.8 • 16 eta bins including environment 8 FPGAs per module, each: • Environment 0.9×0.9 • Each FPGA receives fully duplicated data in eta and phi: 1.6×1.6 worth of data required for a core of 0.8×0.8 • 256 bins @ 0.1×0.1 in η×φ, e/m + had 512 numbers • With baseline 6.4 , 64 Multi-Gb/s receivers • Hier fasercount Uli Schäfer 8 fibre count / density • Erst durch 2 und dann mal 8 nach oben an slide 7 • 64 * 8 channels per FPGA • Due to full duplication in phi direction, exactly half of all 512 signals are routed into the modules optically on fibres • 256 fibres • 22 × 12-channel opto receivers • 4 × 72-way fibre bundles / MTP connectors • Option • For larger jets window we require larger FPGAs, some more fibres and replication factor > ×2 • Aim at higher line rates (currently FPGAs support 13 Gb/s, microPOD 10 Gb/s) • Allow for even finer granularity / larger jets / smaller FPGA devices : • If digital processor baseline allows for full duplication of 6.4Gb/s signals, the spare capacity, when run at higher rate, can be used to achieve a replication of more than 2-fold, so as to support a larger jet environment. Uli Schäfer 10 How to fit on a module ? • ATCA • 8 processors (~XC7VX690T) • 4 microPODs each • fan-out passive or “far end PMA loopback” • Small amount of control logic / non-realtime (ROD) • Nein! Might add 9th processor for consolidation of results • Opto connectors in Zone 3 • Module control !!! • Maximise module payload with help of small-footprint ATCA power brick and tiny IPMC mini-DIMM Uli Schäfer Z3 11 …and 3-d Uli Schäfer 12 jFEX system • Need to handle both fine granularity and large jet environment (minimum 0.9×0.9) Require high density / high bandwidth per moduleNeed that density to keep input replication factor at acceptable level ~ 8 modules (+FCAL ?) Bild neuer Teil, e,j, e ausgegraut Single crate go for ATCA shelf / blades: • Sharing infrastructure with eFEX • Handling / splitting of fibre bundles • ROD design • Hub design • RTM • • Input signals: • Granularity .1×.1 (η×φ) • One electromagnetic, one hadronic tower – nach oben • --- nach unten • Unlike eFEX, no “BCMUX” scheme due to consecutive non-zero data • 6.4 Gb/s line rate, 8b/10b encoding, 128 bit per BC • For now, assume 16bit per tower, 8 towers per fibre Uli Schäfer 13 Tilecal input options • List 3 options and mention DPS approach • For following slides/drawings: Merge Sam and HCSC slides • Do we attach preferences, do we talk about advantages/disadvantages ? • My preference: NO!!! Uli Schäfer 15 Background Phase-1 eFEX and jFEX receive digital EM layer data from LAr DPS But equivalent Tile data path not available before Phase 2 So: need to extract digital hadronic tower sums produced from the current analog sums sent to L1Calo Three points where this can be done See next slide 16 Alternatives Barrel sector logic Muon Trigger MuCTPi Can extract Tile tower sums from: 1. Tile receiver stations 2. PreProcessor modules 3. JEM modules in JEP Endcap sector logic Muon detector eFEX Central Trigger L1Topo DPS EM calorimeter digital readout EM data to FEX CTP C O R E jFEX Hadronic data to FEX Tile tower “DPS” 3 2 1 PreProcessor Analog sums from Tile/LAr JEP JEM Receiver stations C M X nMCM CP L1Calo Trigger C M X CTP output New/upgraded Hardware Topological info 17 Considerations Latency Dynamic range Current L1Calo towers have 8 bit dynamic range with 1GeV/LSB Would like 9 or 10 bits, if possible Cost to implement Risk of disruption to existing system 18 Option 1: Tile Rx stations • Signals extracted at arrival point in USA15, so latency cost is minimal • Must build a new system to digitize and process analog signals • No constraints on dynamic range • Cost is high – essentially need to build new receiver and PreProcessor systems • High risk of disruption to current L1Calo: • Analog data path ahead of L1Calo rearranged • Where do we fit the new systems that do this? 19 Option 2: PreProcessor New MCM (Phase 0) FPGA based tower processing Can drive higher-speed data to the LVDS link driver card (blue arrows) Replacement link card (Phase 1) Send tower data electrically to CP and JEP (same as now) An FPGA and parallel-optic transmitter (e.g. minipod) produce hadronic output to FEX Fiber ribbon takes data from link card to an MTP/MPO output port (probably on front panel) nMCM prototype 20 Option 2: PreProcessor • Minimal latency: • Essentially equal to option 1; • Can extend dynamic range: • nMCM can drive outputs at higher rates, so more bits per tower possible • ‘Easy’ to get 9 bits, 10 bits probably possible • Relatively low cost • nMCM will already exist • A few (small) LVDS link boards • Possibly need to replace some PreProcessor mother boards (8 layers, low component count) • Low disruption: Only upgrading existing boards 21 Option 3: JEM Upgrade Upgraded input cards Double-rate tower data from upgraded PPM (960 Mbit/s) High-speed links to FEX from input cards to front panel (lowest latency) (hadronic tower sums) 22 Option 3: JEM upgrade Higher latency: Serial transmission from PPr to JEP adds multiple BCs to latency Limited dynamic range: BCMUX protocol consumes some bandwidth 9 bits possible (by removing parity), 10 bits probably not possible Similar cost to Option 2 PreProcessor nMCM and link cards still get replaced (but not PPr mother boards?) Plans to upgrade JEM daughter boards anyway Low disruption: Again, similar to Option 2 23 Current System to DAQ r/o data power J 2 RGTM LCD (f/o & routing) ch1 CP1 BCMUX ch2 LVDS-Tx 10 ch3 BCMUX LVDS-Tx 32x CP2 ch4 480Mb/s 10 SUM FPGA (Spartan-6) FPGA to CP FPGA LVDS-Tx MCM #1 JEP MCM #16 16x 480Mb/s PPM L1Calo Weekly Meeting, 10/01/2013 FPGA FPGA C P (LVDS cables) to JEP J E P (LVDS cables) Virtex-II V.Andrei, KIP 2 Phase-I: first solution to DAQ r/o data power J 2 RGTM nLCD (f/o & routing) ch1 CP1 BCMUX ch2 LVDS-Tx 10 ch3 BCMUX LVDS-Tx 32x CP2 ch4 480Mb/s MUX + LVDS-Tx FPGA (Spartan-6) MCM #1 JEP MCM #16 16x 960Mb/s PPM L1Calo Weekly Meeting, 10/01/2013 FPGA to CP FPGA FPGA FPGA C P (LVDS cables) to JEP J E P (LVDS cables) Spartan-6/Artix-7 V.Andrei, KIP 3 Phase-I: second solution (A) Rear Extension to DAQ r/o data power J 2 RGTM to jFEX (optic fibers) nLCD (f/o & routing) ch1 CP1 BCMUX ch2 10 ch3 BCMUX SNAP12 LVDS-Tx LVDS-Tx 32x CP2 ch4 480Mb/s MUX + LVDS-Tx FPGA (Spartan-6) MCM #1 JEP MCM #16 16x 960Mb/s PPM FPGA to CP FPGA FPGA FPGA C P C P (LVDS cables) to JEP J E P FPGA J E P (LVDS cables) ? Spartan-6/Artix-7 Xilinx 7 Series L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP 4 Phase-I: second solution (B) Rear Extension to DAQ r/o data power J 2 RGTM to jFEX (optic fibers) LCD (f/o & routing) ch1 CP1 BCMUX ch2 10 ch3 SNAP12 LVDS-Tx BCMUX LVDS-Tx 32x CP2 ch4 480Mb/s 10 SUM FPGA (Spartan-6) FPGA to CP FPGA LVDS-Tx MCM #1 JEP MCM #16 16x 480Mb/s PPM FPGA FPGA C P FPGA C P (LVDS cables) to JEP J E P J E P (LVDS cables) Virtex-II Xilinx 7 Series L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP 5 Firmware • • • • • jFEX Sliding window algorithm Infrastructure for high speed links Module control DAQ (buffers and embedded ROD functionality, common effort) • • • • Example : JEM based Tilecal inputs Serialization nMCM 960Mb/s Serialization JMM 6.4Gb/s Re-target existing JEM input firmware to new FPGA Uli Schäfer 28 • Development line Uli Schäfer 29 Demonstrator projects : “GOLD” Try out technologies and schemes for L1Topo • Fibre input from the backplane (MTP-CPI connectors) • Up to 10Gb/s o/e data paths via industry standard converters on mezzanine • Using mid range FPGAs (XC6VLX240T) up to 6.4Gb/s, 24 channels per device • Typical sort/dφ algorithm (successfully implemented) takes ~ 13% logic resources • Real-time output via opto links on the front panel (currently used as data source for latency measurements etc.) • Will continue to be used as source/sink for L1Topo tests Uli Schäfer 30 Recent GOLD results (Virtex-6) • • • • • • Jitter analysis on cleaned TTC clock (σ = 2.9 ps) Signal integrity: sampled in several positions along the chain MGT and o/e converters settings optimization Bit Error Rate (BER) < 10-16 at 6.4 Gbps / 12 channels Eye widths above 60 ps (out of 156 ps) Crosstalk among channels measured in some cases but with negligible effect Uli Schäfer Sampled after fan-out chip 31 GOLD latency (Virtex-6) Latency measured along the real-time path in various points at 6.4 Gbs, 16 bit data width, and 8b/10b encoding • Far End PMA loopback: 34 ns latency • Electrical (LVDS) output: 63 ns latency • Far End PCS loopback: 78 ns latency • Parallel loopback in fabric: 86 ns latency Firmware mod for latency measurement including algorithm, and electrical out towards CTP under way (Volker almost got there…) Topo processor details – post PDR • • • Real-time path: • 14 fibre-optical 12-way inputs (miniPOD) • Via four 48-way backplane connectors • 4-fold segmented reference clock tree, 3 Xtal clocks each, plus jittercleaned LHC bunch clock multiple • Two processors XC7V690T (prototype XC7V485T) • Interlinked by 238-way LVDS path • 12-way (+) optical output to CTP • 32-way electrical (LVDS) output to CTP via mezzanine Full ATCA compliance / respective circuitry on mezzanine Module control via Kintex and Zynq processors • Initially via VMEbus extension • Eventually via Kintex or Zynq processor (Ethernet / base interface) Uli Schäfer 33 floor plan, so far… • Processor FPGA configuration via SystemACE and through module controller • Module controller configuration via SPI and SD-card • DAQ and ROI interface • Two SFPs, L1Calo style • up to 12 opto fibres (miniPOD) • Hardware to support both L1Calo style ROD interface, and embedded ROD / SLink interface on these fibres Uli Schäfer 34 … and in 3-d Uli Schäfer 35 Tests miniPOD on VC707 • From Eduard • Note : microPOD same o/e engine as miniPOD • (no pics on MicroPOD, probably confidential…) Uli Schäfer 36 Schedule • Extract from Ian's Gantt chart, and that’s it? • Check with Ian whether anything to be updated before the session • Also manpower estimates ? • If so, anything to be said in addition to what’s in the document ? Uli Schäfer 37 conclusion • The 8-module jFEX seems possible with ~2013’s technology • Key technologies explored already (GOLD, L1Topo,…) • Use of microPODs challenging for thermal and mechanical reasons, but o/e engine is the same as in popular miniPODs • Scheme allows for both fine granularity and large environment at 6.4Gb/s line rate and a limit of 100% duplication of input channels • Rather dense circuitry, but comparable to GOLD demonstrator • For even finer granularity and / or larger jets things get more complicated Need to explore higher transmission rates • DPS needs to handle the required duplication (in eta) Details of fibre organization and content cannot be presented now Started work on detailed specifications, in parallel exploring higher data rates… • Tilecal signals required in FEXes in fibre-optical format • Three options for generating them • All seem viable, but probably at different cost • Need to agree on a baseline before TDR Uli Schäfer 38