Folie 1

advertisement
http://www.staff.uni-mainz.de/uschaefe/browsable/Meeting/2013/neu/
jFEX
jFEX (inc. specific software/firmware & Tilecal input signal, options) 30'
Mainz
Logos ?
Uli Schäfer
1
jFEX (inc. specific software/firmware & Tilecal input signal, options) 30'
Assume we have to cover all that’s in the document, but more details, where
we feel it helps. Did I forget to cover any important section
- No physics justification
• Jet processing general
• L1Calo phase 1 scheme
• Algorithms
• Data replication
• jFEX description #4
• Density, fibre count etc.
•
Input options #4
•
Firmware (also software to be mentioned?)
•
•
Demonstrators/current designs (GOLD, Topo, MiniPOD/V7) #4
Schedule, manpower req. jfex input HD+MZ
Uli Schäfer
2
Jet processing
Phase-0 jet system consisting of
• Pre-Processor
• Analogue signal conditioning
• Digitization
• Digital signal processing
• Jet element pre-summation to 0.2 x 0.2 (η×φ)
• Jet processor
• Sliding window processor for jet finding
• Jet multiplicity determination
• Jet feature extraction into L1Topo (pre-phase1)
At phase-1: complement with jet feature extractor jFEX
• LAr signals optically from digital processor system
• Tilecal signals from analogue Pre-Processor / JEP …
• … eventually Tilecal optical data off detector, and possible
retirement of current L1Calo system
Uli Schäfer
3
L1Calo Phase-1 System
From Digital
Processing
System
ROD
RTM
Opt.
Hub
eFEX
Plant
ROD
RTM
Hub
jFEX
L1Topo
JEM
JMM
CMX
PPR
CPM
CMX
New at Phase 1
Uli Schäfer
4
• Hier 3 optionen
Uli Schäfer
5
Algorithms, now and then
Phase 0
tower 0.2 x 0.2
Phase 1
0.1 x 0.1
ROI 0.4 x 0.4
three jet
windows
up to 0.8 × 0.8 0.9 × 0.9
limited by data duplication
Uli Schäfer
Sliding window algorithm :
Find and disambiguate ROIs
Calculate jet energy in differently
sized windows (programmable)

• Improve granularity by factor of
four, to 0.1×0.1 (η×φ)
• Slightly increase environment
• Allow for flexibility in jet
definition (non-square jet shape,
Gaussian filter, …)
• Fat jets to be calculated from
high granularity small jets
• Optionally increase jet
environment (baseline 0.9 × 0.9)
6
Data replication
Sliding window algorithm requiring large scale replication of
data
• Forward duplication only (fan-out), no re-transmission
• Baseline: no replication of any source into more than two
sinks
• Fan-out in eta (or phi) handled at source only (DPS)
• Duplication at the parallel end (on-FPGA), using
additional Multi-Gigabit Transceivers
• Allowing for differently composed streams
• Minimizing latency
• Fan-out in phi (or eta) handled at destination only
• Baseline “far end PMA loopback”
• Looking into details and alternatives
Uli Schäfer
7
Initial baseline
8+ Modules, each covering full phi, limited eta range
• Environment of 0.9 in eta (core bin +/- 4 neighbours)
• Each module receives fully duplicated data in eta :
1.6 eta worth of data required for a core of 0.8
• 16 eta bins including environment
8 FPGAs per module, each:
• Environment 0.9×0.9
• Each FPGA receives fully duplicated data in eta and phi:
1.6×1.6 worth of data required for a core of 0.8×0.8
• 256 bins @ 0.1×0.1 in η×φ, e/m + had  512 numbers
• With baseline 6.4 , 64 Multi-Gb/s receivers
• Hier fasercount
Uli Schäfer
8
fibre count / density
• Erst durch 2 und dann mal 8 nach oben an slide 7
• 64 * 8 channels per FPGA
• Due to full duplication in phi direction, exactly half of all 512
signals are routed into the modules optically on fibres
• 256 fibres
• 22 × 12-channel opto receivers
• 4 × 72-way fibre bundles / MTP connectors
• Option
• For larger jets window we require larger FPGAs, some more
fibres and replication factor > ×2
• Aim at higher line rates (currently FPGAs support 13 Gb/s,
microPOD 10 Gb/s)
• Allow for even finer granularity / larger jets / smaller FPGA
devices :
• If digital processor baseline allows for full duplication of
6.4Gb/s signals, the spare capacity, when run at higher rate,
can be used to achieve a replication of more than 2-fold, so
as to support a larger jet environment.
Uli Schäfer
10
How to fit on a module ?
•
ATCA
•
8 processors
(~XC7VX690T)
•
4 microPODs each
•
fan-out passive or “far end
PMA loopback”
•
Small amount of control
logic / non-realtime (ROD)
•
Nein! Might add 9th
processor for consolidation
of results
•
Opto connectors in
Zone 3
•
Module control !!!
•
Maximise module payload
with help of small-footprint
ATCA power brick and tiny
IPMC mini-DIMM
Uli Schäfer
Z3
11
…and 3-d
Uli Schäfer
12
jFEX system
•
Need to handle both fine granularity and large jet environment (minimum
0.9×0.9)
 Require high density / high bandwidth per moduleNeed that density to
keep input replication factor at acceptable level
~ 8 modules (+FCAL ?)
Bild neuer Teil, e,j, e ausgegraut
Single crate
go for ATCA shelf / blades:
• Sharing infrastructure with eFEX
• Handling / splitting of fibre bundles
• ROD design
• Hub design
• RTM
•
•
Input signals:
• Granularity .1×.1 (η×φ)
• One electromagnetic, one hadronic tower – nach oben
• --- nach unten
• Unlike eFEX, no “BCMUX” scheme due to consecutive non-zero data
• 6.4 Gb/s line rate, 8b/10b encoding,  128 bit per BC
• For now, assume 16bit per tower, 8 towers per fibre
Uli Schäfer
13
Tilecal input options
• List 3 options and mention DPS approach
• For following slides/drawings:
Merge Sam and HCSC slides
• Do we attach preferences, do we talk about
advantages/disadvantages ?
• My preference: NO!!!
Uli Schäfer
15
Background
 Phase-1 eFEX and jFEX receive digital EM layer
data from LAr DPS
 But equivalent Tile data path not available
before Phase 2
 So: need to extract digital hadronic tower sums
produced from the current analog sums sent to
L1Calo
 Three points where this can be done
 See next slide
16
Alternatives
Barrel sector logic
Muon Trigger
MuCTPi
Can extract Tile tower sums from:
1. Tile receiver stations
2. PreProcessor modules
3. JEM modules in JEP
Endcap sector logic
Muon detector
eFEX
Central Trigger
L1Topo
DPS
EM calorimeter
digital readout
EM data to FEX
CTP
C
O
R
E
jFEX
Hadronic data to FEX
Tile tower
“DPS”
3
2
1
PreProcessor
Analog
sums from
Tile/LAr
JEP
JEM
Receiver
stations
C
M
X
nMCM
CP
L1Calo Trigger
C
M
X
CTP output
New/upgraded
Hardware
Topological info
17
Considerations
 Latency
 Dynamic range
 Current L1Calo towers have 8 bit dynamic range with
1GeV/LSB
 Would like 9 or 10 bits, if possible
 Cost to implement
 Risk of disruption to existing system
18
Option 1: Tile Rx stations
• Signals extracted at arrival point in USA15, so
latency cost is minimal
• Must build a new system to digitize and process
analog signals
• No constraints on dynamic range
• Cost is high – essentially need to build new
receiver and PreProcessor systems
• High risk of disruption to current L1Calo:
• Analog data path ahead of L1Calo rearranged
• Where do we fit the new systems that do this?
19
Option 2: PreProcessor
 New MCM (Phase 0)
 FPGA based tower processing
 Can drive higher-speed data
to the LVDS link driver card
(blue arrows)
 Replacement link card
(Phase 1)
 Send tower data electrically to
CP and JEP (same as now)
 An FPGA and parallel-optic
transmitter (e.g. minipod)
produce hadronic output to
FEX
 Fiber ribbon takes data from
link card to an MTP/MPO
output port (probably on front
panel)
nMCM prototype
20
Option 2: PreProcessor
• Minimal latency:
• Essentially equal to option 1;
• Can extend dynamic range:
• nMCM can drive outputs at higher rates, so
more bits per tower possible
• ‘Easy’ to get 9 bits, 10 bits probably possible
• Relatively low cost
• nMCM will already exist
• A few (small) LVDS link boards
• Possibly need to replace some PreProcessor
mother boards (8 layers, low component
count)
• Low disruption: Only upgrading existing
boards
21
Option 3: JEM Upgrade
Upgraded input cards
Double-rate
tower data from
upgraded PPM
(960 Mbit/s)
High-speed links to FEX
from input cards to front
panel (lowest latency)
(hadronic tower sums)
22
Option 3: JEM upgrade
 Higher latency:
 Serial transmission from PPr to JEP adds
multiple BCs to latency
 Limited dynamic range:
 BCMUX protocol consumes some bandwidth
 9 bits possible (by removing parity), 10 bits
probably not possible
 Similar cost to Option 2
 PreProcessor nMCM and link cards still get
replaced (but not PPr mother boards?)
 Plans to upgrade JEM daughter boards
anyway
 Low disruption: Again, similar to Option 2
23
Current System
to DAQ
r/o data
power
J
2
RGTM
LCD
(f/o & routing)
ch1
CP1
BCMUX
ch2
LVDS-Tx
10
ch3
BCMUX
LVDS-Tx
32x
CP2
ch4
480Mb/s
10
SUM
FPGA (Spartan-6)
FPGA
to CP
FPGA
LVDS-Tx
MCM #1
JEP
MCM #16
16x
480Mb/s
PPM
L1Calo Weekly Meeting, 10/01/2013
FPGA
FPGA
C
P
(LVDS cables)
to JEP
J
E
P
(LVDS cables)
Virtex-II
V.Andrei, KIP
2
Phase-I: first solution
to DAQ
r/o data
power
J
2
RGTM
nLCD
(f/o & routing)
ch1
CP1
BCMUX
ch2
LVDS-Tx
10
ch3
BCMUX
LVDS-Tx
32x
CP2
ch4
480Mb/s
MUX +
LVDS-Tx
FPGA (Spartan-6)
MCM #1
JEP
MCM #16
16x
960Mb/s
PPM
L1Calo Weekly Meeting, 10/01/2013
FPGA
to CP
FPGA
FPGA
FPGA
C
P
(LVDS cables)
to JEP
J
E
P
(LVDS cables)
Spartan-6/Artix-7
V.Andrei, KIP
3
Phase-I: second solution (A)
Rear Extension
to DAQ
r/o data
power
J
2
RGTM
to jFEX
(optic fibers)
nLCD
(f/o & routing)
ch1
CP1
BCMUX
ch2
10
ch3
BCMUX
SNAP12
LVDS-Tx
LVDS-Tx
32x
CP2
ch4
480Mb/s
MUX +
LVDS-Tx
FPGA (Spartan-6)
MCM #1
JEP
MCM #16
16x
960Mb/s
PPM
FPGA
to CP
FPGA
FPGA
FPGA
C
P
C
P
(LVDS cables)
to JEP
J
E
P
FPGA
J
E
P
(LVDS cables)
?
Spartan-6/Artix-7
Xilinx 7 Series
L1Calo Weekly Meeting, 10/01/2013
V.Andrei, KIP
4
Phase-I: second solution (B)
Rear Extension
to DAQ
r/o data
power
J
2
RGTM
to jFEX
(optic fibers)
LCD
(f/o & routing)
ch1
CP1
BCMUX
ch2
10
ch3
SNAP12
LVDS-Tx
BCMUX
LVDS-Tx
32x
CP2
ch4
480Mb/s
10
SUM
FPGA (Spartan-6)
FPGA
to CP
FPGA
LVDS-Tx
MCM #1
JEP
MCM #16
16x
480Mb/s
PPM
FPGA
FPGA
C
P
FPGA
C
P
(LVDS cables)
to JEP
J
E
P
J
E
P
(LVDS cables)
Virtex-II
Xilinx 7 Series
L1Calo Weekly Meeting, 10/01/2013
V.Andrei, KIP
5
Firmware
•
•
•
•
•
jFEX
Sliding window algorithm
Infrastructure for high speed links
Module control
DAQ (buffers and embedded ROD functionality, common
effort)
•
•
•
•
Example : JEM based Tilecal inputs
Serialization nMCM 960Mb/s
Serialization JMM 6.4Gb/s
Re-target existing JEM input firmware to new FPGA
Uli Schäfer
28
• Development line
Uli Schäfer
29
Demonstrator projects : “GOLD”
Try out technologies and schemes for L1Topo
• Fibre input from the backplane (MTP-CPI connectors)
• Up to 10Gb/s o/e data paths via industry standard converters
on mezzanine
• Using mid range FPGAs (XC6VLX240T) up to 6.4Gb/s,
24 channels per device
• Typical sort/dφ algorithm (successfully implemented) takes
~ 13% logic resources
• Real-time output via opto links on the front panel (currently
used as data source for latency measurements etc.)
• Will continue to be used as source/sink for L1Topo tests
Uli Schäfer
30
Recent GOLD results (Virtex-6)
•
•
•
•
•
•
Jitter analysis on cleaned TTC clock (σ = 2.9 ps)
Signal integrity: sampled in several positions along the chain
MGT and o/e converters settings optimization
Bit Error Rate (BER) < 10-16 at 6.4 Gbps / 12 channels
Eye widths above 60 ps (out of 156 ps)
Crosstalk among channels measured in some cases but with
negligible effect
Uli Schäfer
Sampled after fan-out chip
31
GOLD latency (Virtex-6)
Latency measured along the
real-time path in various points
at 6.4 Gbs, 16 bit data width, and 8b/10b encoding
• Far End PMA loopback: 34 ns latency
• Electrical (LVDS) output: 63 ns latency
• Far End PCS loopback: 78 ns latency
• Parallel loopback in fabric: 86 ns latency
Firmware mod for latency measurement including algorithm, and
electrical out towards CTP under way (Volker almost got there…)
Topo processor details – post PDR
•
•
•
Real-time path:
• 14 fibre-optical 12-way inputs
(miniPOD)
• Via four 48-way backplane
connectors
• 4-fold segmented reference clock
tree, 3 Xtal clocks each, plus jittercleaned LHC bunch clock multiple
• Two processors XC7V690T
(prototype XC7V485T)
• Interlinked by 238-way LVDS path
• 12-way (+) optical output to CTP
• 32-way electrical (LVDS) output to
CTP via mezzanine
Full ATCA compliance / respective
circuitry on mezzanine
Module control via Kintex and Zynq
processors
• Initially via VMEbus extension
• Eventually via Kintex or Zynq
processor (Ethernet / base
interface)
Uli Schäfer
33
floor plan, so far…
• Processor FPGA
configuration via
SystemACE and through
module controller
• Module controller
configuration via SPI and
SD-card
• DAQ and ROI interface
• Two SFPs, L1Calo style
• up to 12 opto fibres
(miniPOD)
• Hardware to support
both L1Calo style ROD
interface, and
embedded ROD / SLink interface on these
fibres
Uli Schäfer
34
… and in 3-d
Uli Schäfer
35
Tests miniPOD on VC707
• From Eduard
• Note : microPOD same o/e engine as miniPOD
• (no pics on MicroPOD, probably confidential…)
Uli Schäfer
36
Schedule
• Extract from Ian's Gantt chart, and that’s it?
• Check with Ian whether anything to be updated before
the session
• Also manpower estimates ?
• If so, anything to be said in addition to what’s in the
document ?
Uli Schäfer
37
conclusion
• The 8-module jFEX seems possible with ~2013’s technology
• Key technologies explored already (GOLD, L1Topo,…)
• Use of microPODs challenging for thermal and mechanical
reasons, but o/e engine is the same as in popular miniPODs
• Scheme allows for both fine granularity and large environment
at 6.4Gb/s line rate and a limit of 100% duplication of input
channels
• Rather dense circuitry, but comparable to GOLD demonstrator
• For even finer granularity and / or larger jets things get more
complicated
 Need to explore higher transmission rates
• DPS needs to handle the required duplication (in eta)
Details of fibre organization and content cannot be presented now
 Started work on detailed specifications, in parallel exploring
higher data rates…
• Tilecal signals required in FEXes in fibre-optical format
• Three options for generating them
• All seem viable, but probably at different cost
• Need to agree on a baseline before TDR
Uli Schäfer
38
Download