A Full Mesh ATCA-Based General Purpose Data Processing Board - Designed for silicon based track trigger - 奥村恭幸 (OKUMURA, Yasuyuki) University of Chicago / Fermilab Ted Liu, Jamieson Olsen, Hang Yin 1/15 Tracking Trigger @ high luminosity • High luminosity LHC is requisite approach • Track trigger is key technique to maintain efficient trigger with high luminosity (80 collisions every 25ns) beam dispersion σz ~10cm 2/15 Data Formatting challenges in silicon based track trigger Technical challenges in data formatting prior to tracking – Tracking trigger must be implemented as parallel processors (trigger η-φ tower) and send the detected hits to appropriate processors (more than 100k hits / event) • Detector as well as readout is not designed for trigger η-φ towers in ATLAS-Fast Tracker (FTK) case which uses existing silicon detectors – Huge number of hits have to be shared among the processors to avoid inefficiency at boundary region Tracking Processor Tracking Processor routing copying Parallel processing units Tracking Processor Tracking Processor 3/15 ATLAS FTK data formatter Hits sharing between ATLAS FTK η-φ towers • Required connectivity in ATLAS FTK data formatter cases Full mesh network topology is the natural solution 4/15 Full mesh ATCA • Advanced Telecommunication Computing Architecture (AdvancedTCA / ATCA) backplane supports Full Mesh network topology with high speed Network topology of 14board system in full mesh Zone2 Connector for full mesh 5/15 Board design concept • Two FPGAs (running up to ~10 Gb/s) • They are interfaced to mezzanine cards (FMC), Rear Transition Module (RTM), ATCA full mesh backplane Local bus ATCA backplane RTM ATCA backplane Opt fiber (inter-crate) 6/15 Prototype (Pulsar 2a) design Mezzanine Interface Zone 3 RTM Local Bus Zone 2 Full Mesh GTX LVDS 7/15 57 20 First prototype board DRAFT Test Mezzanine Card Top FPGA Test Mezzanine Card ZONE 3 ZONE 2 ZONE 2 Bottom FPGA ZONE 1 ZONE 1 RTM Figure 16: Prototype DF with its RTM and two test mezzanine cards. Mini backplane 8/15 Very first prototype works well • High speed GTX lines w/ Loopback modules – Local bus (BER<1.4x10-15 @ 10 Gb/s) -17 @ 6.25 Gb/s) – ATCA fabric channels (BER<4.2x10 May 3, 2013 – 21 : 57 DRAFT – RTM channel (BER<8.3x10-17 @ 6.25 Gb/s) • Mezzanine Interface RTM – 400 Mb/s/LVDS-pair • 13.6Gb/s/Mezzanine Mezzanine card • Crate level operation ATCA fabric – Control with network switch (hub board) – Full mesh backplane test local bus Figure 16: Prototype DF with its RTM and two test mezzanine cards. 9/15 DF Board Zone2/3 (6.25Gb/s) May 3, 2013 – 21 : 57 Voltage offset RTM Zone3 20 DRAFT May 3, 2013 – 21 : 57 DRAFT ATCA Zone2 6.25 Gb/s 6.25 Gb/s Figure 16: Prototype DF with its RTM and two test mezzanine cards. 512 allowing further optimization of the routing algorithm and improvement in the bandwidth use in the system. The current data-driven estimate does not include the IBL, which will be added in the future after final decisions are made for the IBL module-to-fiber mapping. 513 4.5 AUX 510 511 516 Pattern recognition and first stage track fitting are done in the Processor Units (PU) in the core crates. Each PU consists of an AMB VME card with a large auxiliary board (AUX) behind it. The AMB is described in section 4.6. 517 4.5.1 AUX functionality 514 515 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 The AUX receives hits from the DFs for the 8 silicon layers used in the first stage of track reconstruction. It stores the hits in the Data Organizer (DO), a smart database built on the fly that allows rapid retrieval of hits in a road, and sends the hits to the AMB with coarser resolution appropriate to pattern recognition (superstrips or SS). When the AMB finds a road with hits on at least 7 of the 8 layers, the road number is sent to the AUX which then retrieves all of the hits in the road. Rapid hit retrieval is possible because hits are stored in the DO by SS address, and a road consists of a single SS in each layer. The hits, the road number, and the sector number are transferred to the track fitter (TF). The TF calculates the 2 for each combination of one hit per layer in the road (Nominal fit) using linear constants stored by sector number. The same constants are used when there are hits in only 7 of the 8 layers (Majority fit) by calculating the hit location in the missing layer that would minimize the overall 2 . Any combination with 2 < 2 is sent out of the TF. If a combination with hits on all 8 layers has cut a 2 above 2cut but below 2high , that track candidate is refit 8 times with one hit dropped each time in order to allow for detector inefficiency and picking up a random hit (Recovery fit). If a refit satisfies the goodness of fit requirement, it is accepted. The tracks forwarded from the TF go to the Hit Warrior (HW) function for duplicate track removal. If two tracks in the same road share more than a programmable number of hits, only the higher quality track is kept. The quality is defined based on the 2 and which silicon layers have a hit. Tracks exiting Figure 16: Prototype DF with its RTM and two test mezzanine ca 512 allowing further optimization of the routing algorithm and improvement in the b system. The current data-driven estimate does not include the IBL, which will be after final decisions are made for the IBL module-to-fiber mapping. 513 4.5 AUX 510 511 516 Pattern recognition and first stage track fitting are done in the Processor Units (PU Each PU consists of an AMB VME card with a large auxiliary board (AUX) beh described in section 4.6. 517 4.5.1 AUX functionality 514 515 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 The AUX receives hits from the DFs for the 8 silicon layers used in the first stage of It stores the hits in the Data Organizer (DO), a smart database built on the fly that of hits in a road, and sends the hits to the AMB with coarser resolution appropriate t (superstrips or SS). When the AMB finds a road with hits on at least 7 of the 8 lay is sent to the AUX which then retrieves all of the hits in the road. Rapid hit retrieva hits are stored in the DO by SS address, and a road consists of a single SS in each road number, and the sector number are transferred to the track fitter (TF). The TF calculates the 2 for each combination of one hit per layer in the road linear constants stored by sector number. The same constants are used when there ar 8 layers (Majority fit) by calculating the hit location in the missing layer that would 2 . Any combination with 2 < 2 is sent out of the TF. If a combination with hi cut a 2 above 2cut but below 2high , that track candidate is refit 8 times with one hit d order to allow for detector inefficiency and picking up a random hit (Recovery fit). goodness of fit requirement, it is accepted. The tracks forwarded from the TF go to the Hit Warrior (HW) function for dup If two tracks in the same road share more than a programmable number of hits, on track is kept. The quality is defined based on the 2 and which silicon layers have sample point 10/15 DF local bus vs KC705 @ 10Gb/s May 3, 2013 – 21 : 57 DRAFT Reference Xilinx Kintex7 Evaluation Kit DF board local bus Voltage offset 20 (w/ SFP+ 0dB Loopback adopter) Figure 16: Prototype DF with its RTM and two test mezzanine cards. 512 allowing further optimization of the routing algorithm and improvement in the bandwidth use in the system. The current data-driven estimate does not include the IBL, which will be added in the future after final decisions are made for the IBL module-to-fiber mapping. 513 4.5 AUX 510 511 516 Pattern recognition and first stage track fitting are done in the Processor Units (PU) in the core crates. Each PU consists of an AMB VME card with a large auxiliary board (AUX) behind it. The AMB is described in section 4.6. 517 4.5.1 AUX functionality 514 515 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 The AUX receives hits from the DFs for the 8 silicon layers used in the first stage of track reconstruction. It stores the hits in the Data Organizer (DO), a smart database built on the fly that allows rapid retrieval of hits in a road, and sends the hits to the AMB with coarser resolution appropriate to pattern recognition (superstrips or SS). When the AMB finds a road with hits on at least 7 of the 8 layers, the road number is sent to the AUX which then retrieves all of the hits in the road. Rapid hit retrieval is possible because hits are stored in the DO by SS address, and a road consists of a single SS in each layer. The hits, the road number, and the sector number are transferred to the track fitter (TF). The TF calculates the 2 for each combination of one hit per layer in the road (Nominal fit) using linear constants stored by sector number. The same constants are used when there are hits in only 7 of the 8 layers (Majority fit) by calculating the hit location in the missing layer that would minimize the overall 2 . Any combination with 2 < 2 is sent out of the TF. If a combination with hits on all 8 layers has cut a 2 above 2cut but below 2high , that track candidate is refit 8 times with one hit dropped each time in order to allow for detector inefficiency and picking up a random hit (Recovery fit). If a refit satisfies the goodness of fit requirement, it is accepted. The tracks forwarded from the TF go to the Hit Warrior (HW) function for duplicate track removal. If two tracks in the same road share more than a programmable number of hits, only the higher quality track is kept. The quality is defined based on the 2 and which silicon layers have a hit. Tracks exiting sample point 10 Gb/s 10 Gb/s 11/15 Mezzanine interface & Test mezzanine card Pattern Recognition Associative Memory chip socket • LVDS (400Mb/s/pair) through FMC connector to main board – Double Data Rate (DDR) running at 200 MHz shared between Mezzanine & Main Board – In total 13.6 Gb/s with 34 pairs • Four SFP+ I/O GTX lines tested at 6.25Gb/s 4 high speed lines (by GTX) FPGA DDR3 • Pattern finding associative memory chip capability – FPGA+DDR3 FMC connector 12/15 Successful full crate testing activate all the GTX at 6.25 Gb/s on 7 boards in one crate Hub board } 7 Pulsar 2a boards Slot#4 to #10 Ethernet connection 13/15 Pulsar 2b Pulsar 2b board • Vertex 7 FPGA • 80 GTH lines are implemented in total (cf 32 GTX in pulsar 2a) • Improved design : design is based on highly valuable prototype experiences 14/15 Summary • An ATCA general purpose board is designed – High speed FPGAs interfaced various I/O & internal communication lines • Our first ATCA prototype works well with speed up to 10 Gb/s • We tested details of high speed performance • Pulsar 2b design is work in progress – First application to be used is ATLAS Fast TracKer (from 2015) 15/15