Data Driven Front-End Architectures David Christian Fermilab January 12, 2007 LHCb Upgrade Workshop, Edinburgh 1 Evolution of data acquisition systems in response to higher data rates • DAQ Computer reads out contents of every channel of every front-end module. • CPU reads out modules which say that they have data. • Front-end modules digitize and zero-suppress data – CPU reads out compressed, formatted data. • Front-end modules digitize, zero-suppress, etc. AND send data to next processing level, all without instruction from a central control CPU --(data driven). 2 Example – BTeV Pixels • Pixel detector front-end electronics performs three functions: – Signal processing (amplify, discriminate, digitize). – Zero suppression (pixels make pattern reconstruction easy because occupancy is very low Æ data is very sparse). – Readout. • This presentation focuses on zero suppression and readout. 3 BTeV Pixel chip (FPIX2) • 0.25μ CMOS (rad hard). • 128 rows x 22 columns – 50μ x 400μ pixels. • Data driven, zero-suppressed readout – no trigger; every hit is read out. • Readout off chip is point-to-point, using a configurable number of 140 Mbs serial links (1,2,4, or 6). • All I/O is LVDS (Low Voltage Differential Signaling). 4 FPIX2 Block Diagram Pixel Unit Cells (22 columns of 128 rows each) Fabricated by TSMC (through MOSIS). FPIX2 2003 FPIX2.1 2005. Core End-of-Column logic (22 copies) Core Logic DAC’s Clock Control Logic Next Word Block Data Output Interface Programmable Registers Word Serializer Programming Interface BCO Clock Steering Logic Serial Clock Input/Output High Speed Output 5 FPIX2 Layout Debugging Outputs Pixel array + End-of-Column Logic + ε = “Core” Registers and DAC’s Command Interface Internal bond pads for Chip ID 128x22 Pixel array End-of-Column Logic Data Output Interface LVDS Drivers and I/O pads 6 Pixel Cells (four 50 x 400 μm cells) 12 µm bump pads Preamp 2nd stage +disc ADC Kill/ inject ADC encoder Digital interface 7 Readout details – column • • • • I will illustrate FPIX2 operation by focusing first at the base of a column of pixels. Each column contains four “command sets” and can hold data associated with four separate beam crossings. Command set actions are determined by two state machines, one which changes state on Beam Cross Over clock edges (1/132 ns), and one which changes on Readout clock edges (max frequency = 35 MHz = 1/29 ns). EOC logic is constructed so that the BCO clock & Readout clock need not be related to one another, either in frequency or in phase. 8 No BCO# latched yet L–0 E–0 E–0 E–0 NTS (at bco clock edge) Pixel(s) are hit (at bco clock edge) L–0 NTS F – #1 NTS E–0 E–0 E–0 NTS NTS NTS NTS L–0 E–0 E–0 NTS NTS NTS NTS (at readout clock edge) (at readout clock edge, after horiz. token has arrived) (token) O – #1 NTS O – #1 STS O – #1 Talking L–0 E–0 E–0 NTS NTS NTS NTS L–0 E–0 E–0 NTS STS NTS NTS L–0 E–0 E–0 NTS Talking NTS NTS States: (Empty/Listen/Full/Output) & (Nothing To Say/Something to Say/Talking/Silent) 9 Priority logic insures only 1 command set at a time in Listen or Output. Core Readout • • • • Readout is controlled by two sets of tokens, horizontal & vertical. The vertical tokens are launched (independently) when a command set enters the “Output” state. They move again as data is driven off of a pixel. When any EOC command set enters the “Something to Say” state, the core logic launches the horizontal token on the next readout clock edge (if it is not already active). When the token arrives at a column with something to say, the command set in STS goes to “Talking” at the next readout clock edge, and data is driven onto the Core output bus on this readout cycle. 10 Core Readout (continued) • • When the vertical token arrives at the last hit pixel associated with the command set in the “Output” state, the horizontal token is released and stops at the next column with something to say. This insures that there will not be an empty readout cycle between the last hit in one column and the first hit in the next column. When the horizontal token passes the last column, the core goes silent for one readout cycle. Another readout cycle is required to relaunch the horizontal token, so there are always two empty readout cycles every time the horizontal token sweeps past all 22 columns. These are the only “wasted” readout cycles. 11 Data Output • Data is driven off of a hit pixel onto the Core output bus, which is 23 bits wide. The data word consists of the information generated in the pixel unit cell (7 bit row number, and 3 bit ADC value), plus a 5 bit column number and an 8 bit BCO number, which are added by the end of column logic. • The Data Output Interface latches data from the Core output bus on the falling edge of the readout clock, serializes the data, and drives it off chip. 12 Output Data Format b23 b22 b21 b20 b19 b18 b17 b16 b15 b14 0 Status 0 0 0 0 0 0 Sync 0 0 0 0 0 0 1 Word Mark b23 b22 b21 b20 b19 b18 b17 b16 b15 b14 b13 b12 b11 b10 b09 b08 b07 b06 b05 b04 b03 b02 b01 1 Row Number Column BCO ADC Word Mark • Five bits are used to encode 22 columns. The column numbering scheme has no column number ending in 00. This ensures that a data word can never have 0’s in b01 – b13. This feature distinguishes a data word from a sync/status word. •Synchronization between the FPIX2 and the Pixel Data Combiner Board is established and maintained using the “sync/status” word. Whenever no data is available for output, the FPIX2 transmits the sync/status word. At least two sync/status words are guaranteed to be output every time the column number decreases. In addition, 23 bit hit data is transferred using a 24 bit word. The PDCB uses the word mark bit as a sync check on every word transfer. 13 Programming Interface • • • • Each FPIX2 has a chip id, which is set by wire bonds on internal bond pads. I/O is bussed on three pairs of lines: shift control, shift in, shift out. I/O is synchronous – clocked by the BCO clock. Commands can be addressed to a single chip, or broadcast to all chips on the bus. Shift Control BCO Shift In C5 C4 C3 C2 C1 R5 R4 R3 R2 R1 I3 I2 I1 1 Chip ID Register # Instruction 14 Programming Interface Instructions • • • • • Write (followed by 2, 8, or 2816 bits of data) Set (all bits in register = 1) Read Reset (all bits in register = 0) Default (set register to default value) 15 Registers and DAC’s • • • • 23 of 32 possible registers are used. 14 are 8 bit registers that control Digital to Analog Converters – used to set bias currents and voltages, and comparator thresholds. 2 are serpentine registers (kill and inject) running up and down the pixel columns, with 1 bit in each pixel. 6 control facets of chip operation (# of output pairs, BCO sync check, SendData, RejectHits, Core Reset, Programming Reset). 16 Recap: FPIX2 Block Diagram Pixel Unit Cells (22 columns of 128 rows each) Core End-of-Column logic (22 copies) Core Logic DAC’s Clock Control Logic Next Word Block Data Output Interface Programmable Registers Word Serializer Programming Interface BCO Clock Steering Logic Serial Clock Input/Output High Speed Output 17 Outlook • FPIX2 serial output is slow compared to “modern” commercial electronics – 1.5 Gbps is common for relatively short copper links (x10 performance boost is possible). • FPIX2 zero suppression scheme could probably can be made 10 times faster, but only by adding complexity (grouping pixels within a column & adding another level of token, for example) or reducing the number of pixels per readout chip. • Signal processing must be optimized: – Time between crossings? – Maximum leakage current? – ADC? # of bits? • Radiation environment must be understood: – Total dose. – Mitigation of single event upsets. 18 Backup slides 19 Data Driven Architectures OP OP OP Pipeline OP OP Parallel OP 20 Data Driven Architectures OP OP Parallel Pipeline OP OP OP OP 21 Zero Suppression (1/2) • Pixel array is organized into semiautonomous columns. • Data is held at the individual pixel until it is read out. • Each hit pixel “raises its hand” & each column with a hit “raises its hand.” 22 Zero Suppression (2/2) • Pixels not hit are skipped using “tokens.” • A token is dropped down each column & stops if there is a hit pixel. • A token is launched across the base of the columns & stops at the first column with a hit. • While a hit is read out, the token advances to the next hit (skipping zeros). 23 Readout • Each pixel chip sends its data out on a dedicated point-to-point serial link. • When readout is enabled, output continues as long as the clock is present and there is data to send. • Each BTeV pixel readout chip can output ~ 30 MHz of pixel hits (1 hit per 30 ns). 24 Radiation Hard • 0.25μ CMOS design started in 1998; verified in a series of small test chips. – Tested to 87 Mrad with no degradation in analog performance and only minor changes required to bias conditions. – Digital cells insensitive to total dose. – No latch-up, no gate rupture. – Single event upset cross sections measured, typically < 10-15 cm-2 per bit. • Measured with 200 MeV protons; about the same for MIP’s. • Expected upset rates: ~10 pixel kill bits/hr, 2 DAC register bits/hr, 1 data serializer bit/hr. • Will reload kill pattern each fill, & monitor other registers & reset as necessary (can be done without interrupting data taking for most errors). 25 Aside: could the pixel be smaller? • Amplifier takes < 25% of area. • ~ 50% of area is associated w/3-bit FADC. • Could be made smaller by: – Using fewer ADC bits. – Using newer process (digital sections would get much smaller). 26 Pixel Unit Cell Vdda Thresholds (Vth1-Vth7) Token Out Iff ADC Thermometer Vfb2 Flash Latch to Binary Encoder Vfb Resets Ibp1 Row Address Bus Ibb - Ibp2 Command Interpreter + Sensor Controller 00 - idle 01 - reset Kill 10 - output 11 - listen Inject Test Vref Threshold (Vth0) 4 pairs of Command Lines RFastOR Throttle HFastOR Read Clock Read Reset Token In Token Reset 27 RCLK and SCLK • • The core readout clock (RCLK) is derived from the serial clock (SCLK). SCLK is constructed from external clocks and is nominally 140 MHz. The frequency of RCLK depends on the number of output pairs being used. This relationship means that no buffer memory is required in the Data Output Interface. Configuration SCLK Frequency RCLK Frequency 6 output pairs 140 MHz 35 MHz 140(6/24) 4 output pairs 140 MHz 23.3 MHz 140(4/24) 2 output pairs 140 MHz 11.7 MHz 140(2/24) 1 output pair 140 MHz 5.8 MHz 140(1/24) 35 MHz = 4.6/132 ns 28