ws-procs9x6 - FTK - Fast Tracker for Hadron Colliders

advertisement
THE ASSOCIATIVE MEMORY SYSTEM FOR THE FTK
PROCESSOR AT ATLAS*
D. MAGALOTTI
University of Modena and Reggio Emilia, Via Universita’ 4, 41121 Modena, Italy
S. CITRARO, S. DONATI, P. LUCIANO, M. PIENDIBENE,
University of Pisa, Largo B. Pontecorvo, 3 56127 Pisa, Italy
P. GIANNETTI
Sezione di Pisa INFN, Largo Bruno Pontecorvo 3, 56127 Pisa, Italy
A. LANZA
Sezione di Pavia INFN, Via Agostino Bassi, 6 - 27100 Pavia, Italy
G. VERZELLESI
University of Modena and Reggio Emilia, Via Universita’ 4, 41121 Modena, Italy
SAKELLARIOU ANDREAS
Prisma Electronics SA, El Venizelou 128, Νea Smyrni, 17123, Athens, Greece
W. BILLEREAU, J.M. COMBE
Cern, CH-1211 Geneva 23, Switzerland
In high energy physics experiments, the most interesting processes are very rare and
hidden in an extremely large level of background. As the experiment complexity,
accelerator backgrounds, and instantaneous luminosity increase, more effective and
accurate data selection techniques are needed. The Fast TracKer processor (FTK) is a real
time tracking processor designed for the ATLAS trigger upgrade. The FTK core is the
Associative Memory system. It provides massive computing power to minimize the
processing time of complex tracking algorithms executed online. This paper reports on
the results and performance of a new prototype of Associative Memory system.
*
The work, AM system project, receives support from Istituto Nazionale di Fisica Nucleare; and the
European community FP7 People grant FTK 324318 FP7-PEOPLE-2012-IAPP.
1
2
1. Introduction
The trigger system of a detector installed at a hadron collider must have high
efficiency for the interesting physics processes and it must suppress the
enormous QCD backgrounds. A multilevel trigger [1] is an effective solution for
this task. The ATLAS trigger system [2] consists of three levels. The hardware
Level-1 Trigger quickly locates the regions of interest in the calorimeter and the
muon system, operating with output rates up to 100 kHz. The subsequent trigger
levels, Level-2 and the Event Filter (EF), are collectively known as the highlevel trigger (HLT). They consist of software algorithms running on a farm of
commercial CPUs. A significant improvement to the system performance would
derive from the use of the track parameters. The Fast Tracker processor (FTK)
[3] is a hardware based system designed to perform track reconstruction in the
silicon detectors with offline quality and in time for Level 2 selections.
The Fast TracKer processor is highly parallel, with the detector segmented into
η – φ towers, each with its own tracking processor. Each processor covers one
sixteenth of the detector in φ, 22.5°, plus 10° overlap to maintain high
efficiency. The η range of each region is divided into four overlapping intervals,
for a total of 64 η – φ towers. Consequently, a tower receives only a fraction of
the clusters produced by particles in the silicon detector (hits), and the
Processing Units (PUs) executing track reconstruction have substantially fewer
candidates to process.
Pattern recognition inside each detector tower is executed by two PUs working
in parallel. The time consuming pattern recognition problem is solved by the
Associative Memory (AM) technology [5] exploiting parallelism to the
maximum level. This approach reduces the typical exponential complexity of
the CPU-based algorithms into a linear problem.
3
2. The Associative Memory system tests
Figure 1: The Processing Unit of the FTK system. The AM board is visible on the left, the AUX
board is visible on the.
Figure 1 shows the first prototype of the FTK processing unit, already built and
tested, composed by the AM system, that performs the pattern matching
operation, and the AUX board (actually a first simplified version), that will
perform Track Fitting [3]. The AM system is made of the Associative Memory
chip, that is the key of the FTK processor, and two types of boards, a VME
board (AMB) on which are mounted local associative memory boards (LAMB),
large mezzanines that host the AMchips. The main task of the AMBoard is to
manage the hit distribution and the fired patterns (roads) readout to/from the
AMchips [4].
2.1. Serial links tests
The AM board uses a large network of high speed serial links for data
distribution. We chose Xilinx Spartan 6 devices, which provide Multi-Gigabit
Transceivers. The serial link offers significant flexibility over parallel bus in
terms of transmission distance, noise immunity, and performance. Moreover,
circuit board routing is simpler and differential serial links can transmit data
over longer distances than parallel links. Then an 8B/10B data encoding is used
for DC balance (clock recovery) and error detection.
The tests we performed on serial link were the measurement of the quality of the
link, and a measurement of the Bit Error Rate (BER), using Pseudo Random Bit
Sequence (PRBS). In the transmitter, a PRBS is selected and the receiver
controls a signal that is asserted each time an error is detected in the links.
4
2.2. Pattern matching tests
The pattern-matching test validates the entire functionality of the AM system.
The AM chips are configured through the JTAG protocol. On each LAMB, the
32 AMchips are connected in 8 pipelines of 4 chips. A dedicated FPGA controls
in parallel all the chains performing the conversion between the VME protocol
and the JTAG protocol. A simulated pattern bank has been generated and loaded
into the array of chips. With a JTAG procedure, we check that each pattern of
the bank has been written correctly.
Then an input file with events made of random hits is generated and loaded via
VME in the AMB input memories, one for each layer. When all the memories
are loaded, the hits are transmitted at full speed rate to the AM chips. The output
matched roads from the AM chips are stored in output memories read from
VME. At the end, we compare the list of matched roads found by the hardware
with the output of the AM simulation that produces the list of expected roads
from the random input hits. The test was successful running at 100 MHz, as
expected.
3. Evolution of the AM board
The LAMB and the AMboard shown in figure 1 were designed for the
AMchip04 [xx] the first chip designed explicitly for FTK, with all the needed
functionality, but a die size much smaller (14 mm2) than the final one (180
mm2). The AMchip04 is characterized by a full parallelized I/O and is packaged
in a LQ208 (like its predecessor AMchip03), a very useful package for its pin
accessibility, but also very limited in terms of available pads. However, working
on TSMC 65 nm technology and calculating the power consumption of the chip
we understood that the full size final chip would have required us to provide a
large number of VDD and GND pins, not compatible with the availability of the
old package LQ208. Increasing the chip area requires a change to a complex
BGA package that is almost incompatible with the routing complexity of the
LAMB (see figure 2 on the left).
5
Figure 2: the routing of the left half of the LAMB (the board is symmetric with respect the central
vertical axis). On the left is reported the LAMB designed for the AMchip04 in LQ208 packages, on
the right the new solution for BGAs AMchips provided of serialized I/O.
These problems led us to a different strategy for the AMchip, based totally on
serial link communications. We bought an IP from Silicon Creations to provide
serialized I/O buses inside the AMchip. We investigated sophisticated packaging
options and found the best solution to be a flip-chip BGA, with I/O optimized to
work at 2 Gbps. The distribution of input and output data on serial links
required a full re-design of the LAMB that, as shown in the right of figure 2, is
characterized by an extremely simplified routing.
As a consequence of this simplification, the new LAMB layout has been
squeezed compared to the old one. This was an advantage for the AMBoard
since it was possible to leave space for additional DC-DC converters and filters
that cannot be placed below the LAMB. The motherboard has been redesigned,
without the extension on the board front, made initially to allocate 4 DC-DC
converters (see figure 1) the couldn’t fit in the internal space. In addition we
substituted the 4 Spartan 6 devices with 2 Artix 7 200T FPGAs (16 serial links
each) and the previous connector with the high performance connector ASP134488-01 able to provide not only the high speed signals, but also all the
current necessary to power the chips on the LAMB.
Figure 3 shows a photograph of the described last motherboard version, with
superimposed blue and red arrows to show the flux of input (blue) and output
(red) data from/to the P3 connector (violet square on the right bottom) to/from
the LAMBs respectively . The serial links on the boards have been tested
successfully, but the system misses the new AMchip for a complete
characterization,
6
MiniLAMB-SLP
MiniLAMB-SLP
MiniLAMB-SLP
Figure 4:
The prototype of the final version of AMBoard
4. Conclusion
The design of the first prototype of the Processing Unit of the FTK processor
had to face the most challenging aspects of the AMchip evolution. A first
prototype compatible with the parallelized I/O and package of AMchip04 has
been built and successfully tested. A new development has been necessary to
evolve to the final size AMchip, provided of serialized I/O. The network of high
speed serial links is an elegant solution that offers large advantages: significant
flexibility over parallel buses in terms of transmission distance, noise immunity,
and performance.
References
1. W. Smith, Triggering at LHC Experiments, 2002 Nucl. Instr. and Meth. A,
vol. 478, pp. 62–672.H. Müller and B. D. Serot, Phys. Rev. C52, 2072
(1995).
2. G. Aad et al [ATLAS Collaboration], Expected Performance of the ATLAS
Experiment - Detector, Trigger and Physics, pp. 549, 2008.
3. Andreani et al., The FastTracker Real Time Processor and Its Impact on
Muon Isolation, Tau and b-Jet Online Selections at ATLAS, 2012 TNS
Vol.: 59 , Issue:2, pp, 348 – 357
4. A. Andreani et al., “The AMchip04 and the processing unit prototype for
the FastTracker”, IOP J. Instr. 7, C08007 (2012).
Download