Folie 1

http://www.staff.uni-mainz.de/uschaefe/browsable/Meeting/2013/neu/ jFEX jFEX (inc. specific software/firmware & Tilecal input signal, options) 30' Mainz Logos ? Uli Schäfer 1 jFEX (inc. specific software/firmware & Tilecal input signal, options) 30' Assume we have to cover all that’s in the document, but more details, where we feel it helps. Did I forget to cover any important section - No physics justification • Jet processing general • L1Calo phase 1 scheme • Algorithms • Data replication • jFEX description #4 • Density, fibre count etc. • Input options #4 • Firmware (also software to be mentioned?) • • Demonstrators/current designs (GOLD, Topo, MiniPOD/V7) #4 Schedule, manpower req. jfex input HD+MZ Uli Schäfer 2 Jet processing Phase-0 jet system consisting of • Pre-Processor • Analogue signal conditioning • Digitization • Digital signal processing • Jet element pre-summation to 0.2 x 0.2 (η×φ) • Jet processor • Sliding window processor for jet finding • Jet multiplicity determination • Jet feature extraction into L1Topo (pre-phase1) At phase-1: complement with jet feature extractor jFEX • LAr signals optically from digital processor system • Tilecal signals from analogue Pre-Processor / JEP … • … eventually Tilecal optical data off detector, and possible retirement of current L1Calo system Uli Schäfer 3 L1Calo Phase-1 System From Digital Processing System ROD RTM Opt. Hub eFEX Plant ROD RTM Hub jFEX L1Topo JEM JMM CMX PPR CPM CMX New at Phase 1 Uli Schäfer 4 • Hier 3 optionen Uli Schäfer 5 Algorithms, now and then Phase 0 tower 0.2 x 0.2 Phase 1 0.1 x 0.1 ROI 0.4 x 0.4 three jet windows up to 0.8 × 0.8 0.9 × 0.9 limited by data duplication Uli Schäfer Sliding window algorithm : Find and disambiguate ROIs Calculate jet energy in differently sized windows (programmable)  • Improve granularity by factor of four, to 0.1×0.1 (η×φ) • Slightly increase environment • Allow for flexibility in jet definition (non-square jet shape, Gaussian filter, …) • Fat jets to be calculated from high granularity small jets • Optionally increase jet environment (baseline 0.9 × 0.9) 6 Data replication Sliding window algorithm requiring large scale replication of data • Forward duplication only (fan-out), no re-transmission • Baseline: no replication of any source into more than two sinks • Fan-out in eta (or phi) handled at source only (DPS) • Duplication at the parallel end (on-FPGA), using additional Multi-Gigabit Transceivers • Allowing for differently composed streams • Minimizing latency • Fan-out in phi (or eta) handled at destination only • Baseline “far end PMA loopback” • Looking into details and alternatives Uli Schäfer 7 Initial baseline 8+ Modules, each covering full phi, limited eta range • Environment of 0.9 in eta (core bin +/- 4 neighbours) • Each module receives fully duplicated data in eta : 1.6 eta worth of data required for a core of 0.8 • 16 eta bins including environment 8 FPGAs per module, each: • Environment 0.9×0.9 • Each FPGA receives fully duplicated data in eta and phi: 1.6×1.6 worth of data required for a core of 0.8×0.8 • 256 bins @ 0.1×0.1 in η×φ, e/m + had  512 numbers • With baseline 6.4 , 64 Multi-Gb/s receivers • Hier fasercount Uli Schäfer 8 fibre count / density • Erst durch 2 und dann mal 8 nach oben an slide 7 • 64 * 8 channels per FPGA • Due to full duplication in phi direction, exactly half of all 512 signals are routed into the modules optically on fibres • 256 fibres • 22 × 12-channel opto receivers • 4 × 72-way fibre bundles / MTP connectors • Option • For larger jets window we require larger FPGAs, some more fibres and replication factor > ×2 • Aim at higher line rates (currently FPGAs support 13 Gb/s, microPOD 10 Gb/s) • Allow for even finer granularity / larger jets / smaller FPGA devices : • If digital processor baseline allows for full duplication of 6.4Gb/s signals, the spare capacity, when run at higher rate, can be used to achieve a replication of more than 2-fold, so as to support a larger jet environment. Uli Schäfer 10 How to fit on a module ? • ATCA • 8 processors (~XC7VX690T) • 4 microPODs each • fan-out passive or “far end PMA loopback” • Small amount of control logic / non-realtime (ROD) • Nein! Might add 9th processor for consolidation of results • Opto connectors in Zone 3 • Module control !!! • Maximise module payload with help of small-footprint ATCA power brick and tiny IPMC mini-DIMM Uli Schäfer Z3 11 …and 3-d Uli Schäfer 12 jFEX system • Need to handle both fine granularity and large jet environment (minimum 0.9×0.9)  Require high density / high bandwidth per moduleNeed that density to keep input replication factor at acceptable level ~ 8 modules (+FCAL ?) Bild neuer Teil, e,j, e ausgegraut Single crate go for ATCA shelf / blades: • Sharing infrastructure with eFEX • Handling / splitting of fibre bundles • ROD design • Hub design • RTM • • Input signals: • Granularity .1×.1 (η×φ) • One electromagnetic, one hadronic tower – nach oben • --- nach unten • Unlike eFEX, no “BCMUX” scheme due to consecutive non-zero data • 6.4 Gb/s line rate, 8b/10b encoding,  128 bit per BC • For now, assume 16bit per tower, 8 towers per fibre Uli Schäfer 13 Tilecal input options • List 3 options and mention DPS approach • For following slides/drawings: Merge Sam and HCSC slides • Do we attach preferences, do we talk about advantages/disadvantages ? • My preference: NO!!! Uli Schäfer 15 Background  Phase-1 eFEX and jFEX receive digital EM layer data from LAr DPS  But equivalent Tile data path not available before Phase 2  So: need to extract digital hadronic tower sums produced from the current analog sums sent to L1Calo  Three points where this can be done  See next slide 16 Alternatives Barrel sector logic Muon Trigger MuCTPi Can extract Tile tower sums from: 1. Tile receiver stations 2. PreProcessor modules 3. JEM modules in JEP Endcap sector logic Muon detector eFEX Central Trigger L1Topo DPS EM calorimeter digital readout EM data to FEX CTP C O R E jFEX Hadronic data to FEX Tile tower “DPS” 3 2 1 PreProcessor Analog sums from Tile/LAr JEP JEM Receiver stations C M X nMCM CP L1Calo Trigger C M X CTP output New/upgraded Hardware Topological info 17 Considerations  Latency  Dynamic range  Current L1Calo towers have 8 bit dynamic range with 1GeV/LSB  Would like 9 or 10 bits, if possible  Cost to implement  Risk of disruption to existing system 18 Option 1: Tile Rx stations • Signals extracted at arrival point in USA15, so latency cost is minimal • Must build a new system to digitize and process analog signals • No constraints on dynamic range • Cost is high – essentially need to build new receiver and PreProcessor systems • High risk of disruption to current L1Calo: • Analog data path ahead of L1Calo rearranged • Where do we fit the new systems that do this? 19 Option 2: PreProcessor  New MCM (Phase 0)  FPGA based tower processing  Can drive higher-speed data to the LVDS link driver card (blue arrows)  Replacement link card (Phase 1)  Send tower data electrically to CP and JEP (same as now)  An FPGA and parallel-optic transmitter (e.g. minipod) produce hadronic output to FEX  Fiber ribbon takes data from link card to an MTP/MPO output port (probably on front panel) nMCM prototype 20 Option 2: PreProcessor • Minimal latency: • Essentially equal to option 1; • Can extend dynamic range: • nMCM can drive outputs at higher rates, so more bits per tower possible • ‘Easy’ to get 9 bits, 10 bits probably possible • Relatively low cost • nMCM will already exist • A few (small) LVDS link boards • Possibly need to replace some PreProcessor mother boards (8 layers, low component count) • Low disruption: Only upgrading existing boards 21 Option 3: JEM Upgrade Upgraded input cards Double-rate tower data from upgraded PPM (960 Mbit/s) High-speed links to FEX from input cards to front panel (lowest latency) (hadronic tower sums) 22 Option 3: JEM upgrade  Higher latency:  Serial transmission from PPr to JEP adds multiple BCs to latency  Limited dynamic range:  BCMUX protocol consumes some bandwidth  9 bits possible (by removing parity), 10 bits probably not possible  Similar cost to Option 2  PreProcessor nMCM and link cards still get replaced (but not PPr mother boards?)  Plans to upgrade JEM daughter boards anyway  Low disruption: Again, similar to Option 2 23 Current System to DAQ r/o data power J 2 RGTM LCD (f/o & routing) ch1 CP1 BCMUX ch2 LVDS-Tx 10 ch3 BCMUX LVDS-Tx 32x CP2 ch4 480Mb/s 10 SUM FPGA (Spartan-6) FPGA to CP FPGA LVDS-Tx MCM #1 JEP MCM #16 16x 480Mb/s PPM L1Calo Weekly Meeting, 10/01/2013 FPGA FPGA C P (LVDS cables) to JEP J E P (LVDS cables) Virtex-II V.Andrei, KIP 2 Phase-I: first solution to DAQ r/o data power J 2 RGTM nLCD (f/o & routing) ch1 CP1 BCMUX ch2 LVDS-Tx 10 ch3 BCMUX LVDS-Tx 32x CP2 ch4 480Mb/s MUX + LVDS-Tx FPGA (Spartan-6) MCM #1 JEP MCM #16 16x 960Mb/s PPM L1Calo Weekly Meeting, 10/01/2013 FPGA to CP FPGA FPGA FPGA C P (LVDS cables) to JEP J E P (LVDS cables) Spartan-6/Artix-7 V.Andrei, KIP 3 Phase-I: second solution (A) Rear Extension to DAQ r/o data power J 2 RGTM to jFEX (optic fibers) nLCD (f/o & routing) ch1 CP1 BCMUX ch2 10 ch3 BCMUX SNAP12 LVDS-Tx LVDS-Tx 32x CP2 ch4 480Mb/s MUX + LVDS-Tx FPGA (Spartan-6) MCM #1 JEP MCM #16 16x 960Mb/s PPM FPGA to CP FPGA FPGA FPGA C P C P (LVDS cables) to JEP J E P FPGA J E P (LVDS cables) ? Spartan-6/Artix-7 Xilinx 7 Series L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP 4 Phase-I: second solution (B) Rear Extension to DAQ r/o data power J 2 RGTM to jFEX (optic fibers) LCD (f/o & routing) ch1 CP1 BCMUX ch2 10 ch3 SNAP12 LVDS-Tx BCMUX LVDS-Tx 32x CP2 ch4 480Mb/s 10 SUM FPGA (Spartan-6) FPGA to CP FPGA LVDS-Tx MCM #1 JEP MCM #16 16x 480Mb/s PPM FPGA FPGA C P FPGA C P (LVDS cables) to JEP J E P J E P (LVDS cables) Virtex-II Xilinx 7 Series L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP 5 Firmware • • • • • jFEX Sliding window algorithm Infrastructure for high speed links Module control DAQ (buffers and embedded ROD functionality, common effort) • • • • Example : JEM based Tilecal inputs Serialization nMCM 960Mb/s Serialization JMM 6.4Gb/s Re-target existing JEM input firmware to new FPGA Uli Schäfer 28 • Development line Uli Schäfer 29 Demonstrator projects : “GOLD” Try out technologies and schemes for L1Topo • Fibre input from the backplane (MTP-CPI connectors) • Up to 10Gb/s o/e data paths via industry standard converters on mezzanine • Using mid range FPGAs (XC6VLX240T) up to 6.4Gb/s, 24 channels per device • Typical sort/dφ algorithm (successfully implemented) takes ~ 13% logic resources • Real-time output via opto links on the front panel (currently used as data source for latency measurements etc.) • Will continue to be used as source/sink for L1Topo tests Uli Schäfer 30 Recent GOLD results (Virtex-6) • • • • • • Jitter analysis on cleaned TTC clock (σ = 2.9 ps) Signal integrity: sampled in several positions along the chain MGT and o/e converters settings optimization Bit Error Rate (BER) < 10-16 at 6.4 Gbps / 12 channels Eye widths above 60 ps (out of 156 ps) Crosstalk among channels measured in some cases but with negligible effect Uli Schäfer Sampled after fan-out chip 31 GOLD latency (Virtex-6) Latency measured along the real-time path in various points at 6.4 Gbs, 16 bit data width, and 8b/10b encoding • Far End PMA loopback: 34 ns latency • Electrical (LVDS) output: 63 ns latency • Far End PCS loopback: 78 ns latency • Parallel loopback in fabric: 86 ns latency Firmware mod for latency measurement including algorithm, and electrical out towards CTP under way (Volker almost got there…) Topo processor details – post PDR • • • Real-time path: • 14 fibre-optical 12-way inputs (miniPOD) • Via four 48-way backplane connectors • 4-fold segmented reference clock tree, 3 Xtal clocks each, plus jittercleaned LHC bunch clock multiple • Two processors XC7V690T (prototype XC7V485T) • Interlinked by 238-way LVDS path • 12-way (+) optical output to CTP • 32-way electrical (LVDS) output to CTP via mezzanine Full ATCA compliance / respective circuitry on mezzanine Module control via Kintex and Zynq processors • Initially via VMEbus extension • Eventually via Kintex or Zynq processor (Ethernet / base interface) Uli Schäfer 33 floor plan, so far… • Processor FPGA configuration via SystemACE and through module controller • Module controller configuration via SPI and SD-card • DAQ and ROI interface • Two SFPs, L1Calo style • up to 12 opto fibres (miniPOD) • Hardware to support both L1Calo style ROD interface, and embedded ROD / SLink interface on these fibres Uli Schäfer 34 … and in 3-d Uli Schäfer 35 Tests miniPOD on VC707 • From Eduard • Note : microPOD same o/e engine as miniPOD • (no pics on MicroPOD, probably confidential…) Uli Schäfer 36 Schedule • Extract from Ian's Gantt chart, and that’s it? • Check with Ian whether anything to be updated before the session • Also manpower estimates ? • If so, anything to be said in addition to what’s in the document ? Uli Schäfer 37 conclusion • The 8-module jFEX seems possible with ~2013’s technology • Key technologies explored already (GOLD, L1Topo,…) • Use of microPODs challenging for thermal and mechanical reasons, but o/e engine is the same as in popular miniPODs • Scheme allows for both fine granularity and large environment at 6.4Gb/s line rate and a limit of 100% duplication of input channels • Rather dense circuitry, but comparable to GOLD demonstrator • For even finer granularity and / or larger jets things get more complicated  Need to explore higher transmission rates • DPS needs to handle the required duplication (in eta) Details of fibre organization and content cannot be presented now  Started work on detailed specifications, in parallel exploring higher data rates… • Tilecal signals required in FEXes in fibre-optical format • Three options for generating them • All seem viable, but probably at different cost • Need to agree on a baseline before TDR Uli Schäfer 38

Folie 1

Related documents

Products

Support

Folie 1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib