BTeV Trigger BTeV was terminated in February of 2005.

advertisement
BTeV Trigger
BTeV was terminated in February of 2005.
BTeV Trigger Overview
• Trigger Philosophy: Trigger on characteristics common to all
heavy-quark decays, separated production and decay vertices.
• Aim: Reject > 99.9% of background. Keep > 50% of B events.
• The challenge for the BTeV trigger and data acquisition system
is to reconstruct particle tracks and interaction vertices in
every beam crossing. Looking for topological evidence of B (or
D) decay.
• This is feasible for BTeV detector and trigger system, because
of
–
–
–
–
Pixel detector – low occupancy, excellent spatial resolution, fast readout
Heavily pipelined and parallel architecture (~5000 processors)
Sufficient memory to buffer events while awaiting trigger decision
Rapid development in technology – FPGAs, processors, networking
• 3 Levels:
– L1 Vertex trigger (pixels only) + L1 Muon trigger
– L2 Vertex trigger – refined tracking and vertexing
– L3 Full event reconstruction, data compression
BTeV detector
p
p
30 station
Si pixel detector
14,080 pixels (128 rows x 110 cols)
50 mm
Multichip module
1 cm
400 mm
Si pixel sensors
128 rows x
22 columns
5 cm
5 FPIX ROC’s
L1 vertex trigger algorithm
• Segment Finder (pattern Recognition)
– Find beginning and ending segments of tracks from hit clusters
in 3 adjacent stations (triplets):
• beginning segments: required to originate from beam region
• ending segments: required to project out of pixel detector volume
• Tracking and Vertex Finding.
– Match beginning and ending segments found by FPGA segment
finder to form complete tracks.
– Reconstruct primary interaction vertices using complete tracks
with pT<1.2GeV/c.
– Find tracks that are “detached” from reconstructed primaries.
• Trigger Decision
– Generate Level-1 accept if it
has two “detached” tracks
going into the instrumented
arm of the BTeV detector.
B-meson
p
p
b
BTeV trigger overview
BTeV detector
Front-end electronics
7
> 2 x 10 channels
m
PIX
2.5 MHz
500 GB/s
Level-1
L1 muon
(200KB/event)
Global
Level-1
L1 vertex
Level-1 Buffers
Information Transfer
Control Hardware
L1 rate reduction: ~50x
GL1 accept
ITCH
Req. data for
crossing #N
Crossing #N
50 KHz
12.5 GB/s
(250KB/event)
Level 2/3 Crossing Switch
RDY
#1
L2/3 rate reduction: ~20x
Level-2/3 Processor
Farm
#2
#m-1
#m
Level-2/3 Buffers
Level-3 accept
Data Logging
2.5 KHz
200 MB/s
(250KB / 3.125 = 80KB/event)
Level 1 vertex trigger architecture
30 Pixel Stations
Pixel Pre-processors
FPGA Segment Finders
Switch (sort by crossing number)
~2500-node track/vertex farm
MERGE
To Global Level-1 (GL1)
Pixel Preprocessor
Collision Hall
Counting Room
to neighboring FPGA
segment finder
Pixel stations
Data combiners
Pixel processor
Pixel
processor
DCB
DCB
Optical links
Pixel
processor
Pixel
processor
DCB
FPGA
segment
finder
sync (1bit)
Row (7bits)
Column (5bits)
BCO (8bits)
ADC (3bits)
Optical Receiver Interface
Time Stamp Expansion
Event sorted by Time and
column
Level 1
Buffer
Interface
Hit cluster finder & x-y
coordinate translator
FPIX2 Read-out chip
The Segment Tracker Architecture
Station N
Bend
Station N-1
Bend
Station 17
Station 17
Station 16
Station 16
Station 15
Station 15
Long
doublets
Station N+1
Bend
Long doublet
projections
Bend view
Nonbend view
Triplets
Station N-1
nonbend
Triplets
projection
Triplets
projection
Station N
nonbend
Triplets
projection
N-1 Short
doublets
12 half Pixel
planes at 12
different Z
locations.
N Short
doublets
Short doublet
outputs
MUX
BB33
outputs
N+1 Short
doublets
Station N+1
nonbend
• Find interior and
exterior track segments
in parallel in FPGAs.
• The Segment finder
algorithm is
implemented in VHDL
L1 Track and Vertex Farm
• Original baseline of L1 track and vertex farm used
custom made processor board based on DSP or
other processors. Total processors estimated to be
2500 TI DSP 6711. The L1 switch is custom designed
too.
• After DOE CD1 review, BTeV changed L1 baseline
design.
– L1 Switch, Commercial off-the-shell Infiniband switch
(or equivalent).
– L1 Farm, array of commodity general processors,
Apple G5 Xserves (or equivalent).
Level 1 Trigger Architecture (New Baseline)
30 Pixel Stations
Pixel Processors
33 “8GHz” Apple Xserve G5’s
with dual IBM970’s
FPGA Segment Finders
(56 per highway)
Level 1 switch
33 outputs at ~76 MB/s each
56 inputs at ~45 MB/s each
Infiniband switch
Trk/Vtx
node #1
Trk/Vtx
node #2
Ethernet network
Trk/Vtx
node #N
Track/Vertex Farm
Level 1
Buffer
Global Level 1
Apple Xserve identical to
track/vertex nodes
PTSM network
1 Highway
R&D projects
• Software development for DSP Pre-prototype.
• Level 1 trigger algorithm processing time studies on
various processors.
– Part of trigger system R&D for a custom-made Level
1 trigger computing farm.
• StarFabric Switch test and bandwidth measurement.
– R&D for new Level 1 Trigger system baseline design.
– After DOE CD1 review, BTeV collaboration decided to
change baseline design of Level 1 trigger system.
• L1 Switch – replace custom switch with Infiniband switch(or
equivalent).
• L1 Farm – replace DSP hardware with Apple G5 Xserves (or
equivalent).
• Pixel Preprocessor of Level 1 trigger system.
– Clustering algorithm and firmware development.
DSP Pre-prototype main goals
• Investigate current DSP hardware and software to
determine technical choices for baseline design.
• Study I/O data flow strategies.
• Study Control and Monitoring techniques.
• Study FPGA firmware algorithms and simulation
tools.
– Understand major blocks needed.
– Estimate logic size and achievable data
bandwidths.
• Measure internal data transfers rates, latencies, and
software overheads between processing nodes.
• Provide a platform to run DSP fault tolerant routines.
• Provide a platform to Run Trigger algorithms.
Features of DSP Pre-prototype Board
• Four DSP mezzanine cards on the board. This can test different
Different TI DSPs for comparision.
• The FPGA Data I/O Manager provides two way data buffering. It
Communicates the PCI Test Adapter (PTA ) card to each DSP.
• Two Arcnet Network ports.
– Port I is the PTSM (Pixel Trigger Supervise Monitor).
– Port II is the Global Level 1 result port.
– Each Network port is managed by a Hitachi microcontroller.
– PTSM microcontroller communicates to the DSPs via DSP Host
Interface to generate initialization and commands.
– GL1 microcontroller receives trigger results via DSP’s Buffered
Serial Port (BSP).
• Compact Flash Card to store DSP software and parameters.
• Multiple JTAG ports for debugging and initial startup.
• Operator LEDs.
L1 trigger 4-DSP prototype board
DSP Pr o t o t y p e Bo a r d 9 / 0 1
DSP
Input
Buffer
PCI Test Adapter
Input Buffer
Control
Manager
DSP
RAM
FPGA
ROM
LVDS Link
Interface
LVDSInterface
LVDS
C:\>
ROM
RAM
DSP
Output
Buffer
RAM
ROM
C:\>
Output Buffer
Control
Manager
DSP
To Global
Level-1
ROM
RAM
C:\>
ArcNet
Interface
FPGA
McBSP (for trigger decisions)
Hitachi H8
mcontrollers
ArcNet
Interface
Host Port Interface
FPGA
Pixel Trigger Supervisor
Monitor (PTSM)
FLASH
RAM
JTAG
Level 1 Pixel Trigger Test Stand for the DSP pre-prototype
Xilinx programming
cable
ARCnet card
PTA+PMC card
DSP daughter card
TI DSP JTAG
emulator
DSP Pre-prototype Software(1)
• PTSM task on the Hitachi PTSM microcontroller.
–
–
–
–
–
System initialization. Kernel and DSP application downloading.
Command parsing and distribution to subsystems.
Error handling and reporting.
Hardware and software status reporting.
Diagnostics and testing functions.
• GL1 task on the Hitachi GL1 microcontroller.
– Receives the trigger results from the DSP’s and send to the GL1
host computer.
• Hitachi Microcontroller API. A library of low level C routines
have been developed to support many low level functions.
– ArcNet network driver.
–
–
–
–
–
–
Compact Flash API. Support FAT16 file system.
LCD API. Display messages on the on-board LCD.
Serial Port API:
JTAG API
One Wire API
DSP Interface API. Boot and reset DSP’s; access memory and registers
on the DSP’s.
DSP Pre-prototype Software(2)
• Host computer software.
– PTSM Menu-driven interface.
– GL1 message receiving and displaying.
• Custom defined protocol built on the lowest level of
ArcNet network driver. Most efficient without
standard protocol overhead.
Processor evaluation
•
•
•
We continued to measure Level 1 trigger algorithm processing time on various
new processors.
MIPS RM9000x2 processor. Jaguar-ATX evaluation board.
– Time studies on Linux 2.4
– Time studies on standalone. Compiler MIPS SDE Lite 5.03.06.
– System (Linux) overhead for processing time is about 14%.
PowerPC 7447 (G4) and PowerPC 8540 PowerQuiccIII (G5).
– GDA Tech PMC8540 eval card and Motorola Sandpoint eval board with
PMC7447A.
– Green Hills Multi2000 IDE with Green Hills probe for standalone testing.
8540 eval board
Green Hills Probe
Candidate processors for Level 1 Farm
Processor
TI TMS320C6711 (baseline)
L1 algorithm processing time
1,571 us
provided for comparison
Motorola 8540 PQIII PPC
271 us (660MHz, GHS MULTI 2K 4.01)
Motorola 7447A G4 PPC
121 us (1.4GHz, GHS MULTI 2K 4.01)
Motorola 74xx G4 PPC
195 us (1GHz 7455, Apple PowerMac G4)
PMC Sierra MIPS RM9000x2
341 us (600MHz, MIPS SDE Lite 5.03.06)
Intel Pentium 4/Xeon
117 us (2.4 GHz Xeon)
IBM 970 PPC
74 us (2.0 GHz Apple PowerMac G5)
suited for an off-the-shelf solution using desktop PC (or
G5 server) for computing farm.
StarFabric Switch Testing and Bandwidth Measurement
•
•
•
•
In the new baseline design of BTeV Level 1 trigger system, the commercial, offthe-shelf switch will be used for the event builder.
Two commercial switch technology are tested, Infiniband (by Fermilab) and
StarFabric (by IIT group with Fermilab).
Hardware setup for StarFabric switch testing.
– PC with PCI bus 32/33.
– StarFabric adapter, StarGen 2010.
– StarFabric switch, StarGen 1010.
Software
– StarFabric windows driver.
P4/W2k
Test Stand
Athlon/XP
SG2010
SG1010
SG2010
L1 Switch Bandwidth Measurement
•
•
•
StarFabric bandwitdh is
between 74~84 Mbytes/s for
packet size of 1 kByte to 8
kBytes. This result can not
meet the bandwidth
requirement of event
builder.
A simple way to improve
performance is to use PCIx(32/66 or 64/66) . Infiniband
test stand uses PCI-X
adapters in input/output
computer nodes.
Based on this result and
other consideration,
Infiniband is chosen in the
new baseline design of the
Level 1 trigger system. But,
we are still looking at
StarFabric and other
possible switch fabric.
Infiniband
167 MB/s Bandwidth Target
At peak luminosity (<6> ints./BCO),
with 50% excess capacity
StarFabric
Pixel Preprocessor
Pixel Detector Front-end
30 station pixel detector
Optical Receiver
Interface
PP&ST
Segment Tracker Nodes
Time Stamp
Expansion
Level 1 switch
Infiniband switch
33 outputs at ~76 MB/s each
56 inputs at ~ 45 MB/s each
Event sorted by
Time and culomn
DAQ
Level 1
Buffer
Interface
Hit cluster finder &
x-y coordinate
translator
Segment Trackers
Row and Column Clustering
Pixel Chip
•
•
•
•
A track can hit more than one pixel due to charge sharing.
One function of pixel Preprocessor is to find adjacent pixel hits, group
them as a cluster and calculate x-y coordinates of cluster.
Adjacent hits in the same row form a row cluster.
Two overlapping row clusters in adjacent columns form a cross
column cluster.
Cluster Finder Block Diagram
•
Hit Input
FIFO
Hash Sorter:
Column Ordering
•
•
Row Cluster Processor:
Cross-Row Clusters
Col N-1
Col N
Cross-Column Processor
Cross-Col. Clusters
•
•
Cluster Parameters
Calculator
Cluster
The order of input hits in a row is
defined. However, the column
order is not.
The hash sorter is used to
produce defined column order.
The row cluster processor
identifies adjacent hits in a row
and pass the starting/ending row
numbers to next stage.
The cross-column processor
groups overlap hits (or clusters)
in adjacent columns together.
Cluster parameters are calculated
in the cluster parameter
calculator.
Implementation for Cross-Column Cluster Finder
Col. A
FIFO2
Hits
Col. B
Cross-column
headers
Hits
State
Control
Cross-row
headers
The two clusters
form a crosscolumn one and
are popped out.
FIFO1
The cluster in Col. A
is a single column one
and is popped out.
If Col. B is not
next to the Col.
A, entire Col. A
is popped out.
The cluster in Col. B
is not connected with
Col. A and is filled
into FIFO2.
Col. A Col. B
Col. A
Col. B
Implementation for Cross-Column Cluster Finder (cont’d)
Col. A
Fill Col. A
FIFO2
Col. B =
Col. A +1 ?
Col. B
N
Y
State
Control
uAN< uB1
FIFO1
(1) uAN< uB1
(2) uA1 > uBN
Pop Col. A
Fill Col. B
uA1 > uBN
Neither
Pop A
•
Pop A
Pop B
Fill B
The cross-column cluster finder firmware is written with VHDL.
BES-II DAQ System
BES experiment upgraded its detector and DAQ system in 1997.
Beijing Spectrometer
Performance of BES-II and BES-I
Subsystem
Variable
BES-II
BES-I
MDC
P/P
1.78%(1+P2)1/2
1.76%(1+P2)1/2
xy
198-224 mm
200-250 mm
dE/dx
8.0%
7.8%
VC
xy
90mm
220mm
TOF
T
180ps
375ps
SC
E/E
21%E -1/2
24.4%E -1/2
MUON
Z
7.9 cm Layer 1
10.6 cm Layer 2
13.2 cm Layer 3
DAQ
Dead Time
10ms
20ms
BES-II DAQ System
•
BES
Control
ALPHA 3600 Server
OPEN VMS
•
Ethernet
•
VME 167
HOST
VME 162
Target 1
VME
Memery
VME 162
Target 2
SCSI BUS
DISK
8mm Tape
VME BUS
VMEBus
Repeater
VMEBus
Repeater
•
VME BUS
VCBD 0
VCBD1
VCBD2
VCBD3
VCBD4
VCBD5
VCBD6
VCBD7
VCBD8
1131
C
MDC-Q
A
MDC-T
TRG
M
TOF
MUON
A
BSC
ESC
C
LUM
VC 1879
H.V.
1821
TDC
FASTBUS
•
Front-end electronics for all of system,
except VC, consist of CAMAC BADC
(Brilliant ADC).
VCBD, VME CAMAC Branch Driver.
Read data of one detector subsystem.
And store the data in the local buffer.
Two VME CPU modules with RT OS
VMEexec.
– One for data acquisition and event
building.
– Another one for event logging to
tape and sending a fraction of events
to Alpha 3600.
DEC Alpha 3600 machine.
– DAQ control console.
– Status/error report.
– Online data analysis and display.
– Communication with BEPC control
machines to obtain BEPC status
parameters.
The system dead time: 10 ms.
–
–
BADC conversion: 6ms.
VCBD readout: 3ms.
Fastbus subsystem for Vertex Chamber
ECL INPUT
1879
TDC
1879
TDC
FASTbus
SIB
1821
SM/I
PC
1131
logical
board
common
stop
good
event
reset
VME bus
VME
162
VME
162
• One Fastbus crate for
640 VC channels.
• Fastbus logical board.
– Distributes every kind
of signals to TDCs,
common stop, reset
(fast clear).
– Produce internal start
and stop test pulses.
– Good event signal tells
the 1821 to read data
from 1879.
Microcode for the 1821
•
•
•
•
•
Initialization for 1879.
– TDC scale: 1 us.
– Compact parameter: 10 ns.
– Active Time Interval: 512 bins.
Readout 1879 data into data memory of 1821.
– Block transfer.
– Sparse data scan method. TDC modules containing data are readout only.
Send data ready signal (interrupt) to VME.
SONIC language. Symbolic Macro Assembler. Converted to microcode under LIFT.
LIFT (LeCroy Interactive Fastbus Toolkit). Tool for developing microcodes and testing
FB system under PC.
VC DAQ Software in VME
•
•
•
•
•
•
•
A task running in VME 162.
Control by BES-II DAQ main task through message queues.
Down loading the microcode into 1821.
Control the procedure of VC data taking.
Readout time data from 1821 into 1131 data memory after receiving interrupt
signal.
Data transfer modes:
– High 16-bit: DMA.
– Low 16-bit: word by word.
Measured transfer rate.
– 96(chans)x7(modules)x2(both edges)+3(marks) = 1347 32-bit words.
– High 16-bit: DMA: 1.1 ms @VME 162.
– Low 16-bit: word by word: 3.5 ms@VME 162.
End
The End
Backup slides
Backups
BTeV trigger architecture
Pixel Processors
Level-1 Buffers
FPGA
Segment
Finder
Data Combiners +
Optical Transmitters
Gigabit Ethernet
Switch
Track/Vertex
Farm
Front End Boards
Optical
Receivers
Global GL1
Level-1
Level 2/3
Processor
Farm
Information Transfer
Control Hardware
ITCH
12 x 24-port
Fast Ethernet
Switches
BTeV Detector
8 Data Highways
Cross Connect
Switch
Data Logger
L1 Highway Bandwidth Estimates
Bandwidth estimates
are for 6
interactions/crossing
& include 50 %
excess capacity
Segment
Tracker
Node
Segment
Tracker
Node
Segment
Tracker
Node
~15
L1
L1
L1
Buffers
Buffers
Buffers
167 MB/s
Muons
Total Triplets 2.5GB/s
Front
Ends
Switching Fabric
Bridge
1.8 MB/s +
0.5 MB/s
83 MB/s
15 MB/s
Worker
Node
Worker
Node
~30
Worker
Node
300 KB/s
GL1 +
ITCH
Node
Results+Triplets
54 MB/s
(1/50 rejection)
Other: 1 GB/s
DAQ Highway
Switch
DAQ Switch
Results+Triplets
583 KB/s
Other: 10MB/s (Raw Pixel Data)
L2
Node
L2
Node
~96
L2
Node
Download