ATLAS TileCAL TileCAL LVPS Discussion of SEU Problem Gary Drake, Bob Stanek Argonne National Laboratory, USA May 3, 2011 LVPS Radiation Test Session Summary Session #1: Protons – Dec. 4, 2010 – Massachusetts General Hospital, Cancer Treatment Facility – 188 MeV protons from DC cyclotron – Tested: (3) V7.3.1 +15MB, (2) V7.3.1 +5DIG Session #2: Neutrons – Dec. 6, 2010 – Univ. of Mass., Lowell, MA, Training Reactor – 1 MeV-equivalent neutrons from reactor core – Tested: (3) V7.3.1 +15MB, (2) V7.3.1 +5DIG Session #3: Gammas – Dec. 16, 2010 Better Cooling… Greater Sophistication… – Brookhaven National Lab, Co-60 Source Facility – 1.2 MeV photons from source – Tested: (2) V7.3.1 +15MB, (2) V7.3.1 +5DIG, (1) V6.5.4 +5V Session #4: Protons – Feb. 27, 2011 Repeat to study components – Massachusetts General Hospital, Cancer Treatment Facility – Tested: (1) V7.3.1 +15MB, (1) V7.3.1 +5DIG , (1) V6.5.4 +5V, (2) Components Bds Session #5: Neutrons – Feb. 28, 2011 Repeat to study components – Univ. of Mass., Lowell, MA, Training Reactor – Tested: (3) V7.3.1 +15MB, (2) V7.3.1 +5DIG , (1) V6.5.4 +5V, (2) Components Bds Session #6: Protons – Apr. 17, 2011 Confirmed cause of SEU problem – Massachusetts General Hospital, Cancer Treatment Facility – Tested: (1) V7.3.2 +15MB, (1) V7.3.2 +5DIG, (1) V7.3.1 +5DIG, (2) Components Bds Session #7: Protons June 15, 2011 ? 2 Summary of LVPS Radiation Test Results So Far Radiation Tests – Gamma Tests • No trips through dose; No deaths through dose range of interest • Observe general degradation in noise, stability, & efficiency vs. dose • Calibration constants mostly OK; Some changes in offset voltages • Saw OVP & OCP failures after high dose • Saw failures of opto-isolators after high dose Probably OK, but analysis of data continuing – Neutron Tests • Original test apparatus flawed (oven effect caused tripping from heat), but neutron damage is independent of powering brick Improved in Session 5 • Bricks die at ~85% of target probably OK • Observe general degradation in noise, stability, & efficiency vs. dose • Calibration constants OK; OVP & OCP OK Probably OK; Data analysis in progress – Proton Tests • Single Event Upset (SEU) problem exists in both new and old designs – We have confirmed that it is due to the soft start feature of the controller chip – the heart of the brick design – 3 choices: 1) Live with it; 2) Modify current design; 3) Redesign with new controller This is the schedule driver for production 3 Discussion of SEU Problem Brick Block Diagram LC Filter FET Driver 200V + Transformer + RSHUNT - Vout RSHUNT Startup To ELMB Shutdown Monitor Voltages OVP, OCP, Temp, & Monitor Stop Over Temp LT1681 Controller Chip OpAmp VFB Opto Isolator VIN* Run VOUT* IFB IOUT* Opto Isolator IIN* - GNDSEC GNDPRI Startup & Shutdown Control LC Filter LC Buck Feedback loop makes it hard to diagnose tripping problems… Component Test Board Block Diagram LC Filter OpAmp + All Diode Types 200V VOUT5 VOUT4 Startup Shutdown Startup & Shutdown Control VOUT6 Temp VOUT9 H L L Run Stop Over Temp GNDPRI LT1681 Controller Chip VOUT1 OpAmp Opto Isolator - VREF1 Opto Isolator Bias VREF3 VOUT3 FET Driver VREF2 VREF4 VOUT7 VOUT2 VOUT8 All Active Components Represented All Critical Passive Components Represented (Except Transformer) 4 Discussion of SEU Problem (Cont..) Brick Block Diagram LC Filter FET Driver 200V + Transformer + RSHUNT - Vout RSHUNT Startup To ELMB Shutdown Monitor Voltages OVP, OCP, Temp, & Monitor Stop Over Temp LT1681 Controller Chip OpAmp VFB Opto Isolator VIN* Run VOUT* IFB IOUT* Opto Isolator IIN* - GNDSEC GNDPRI Startup & Shutdown Control LC Filter LC Buck Component Test Board Block Diagram LC Filter OpAmp + All Diode Types 200V VOUT5 VOUT4 Startup Shutdown Startup & Shutdown Control VOUT6 Temp VOUT9 H L L Run Stop Over Temp GNDPRI LT1681 Controller Chip VOUT1 OpAmp Opto Isolator - VREF1 Opto Isolator Bias VREF3 VOUT3 FET Driver VREF2 VREF4 VOUT7 VOUT2 VOUT8 Focus on LT1681 Controller Chip 5 Discussion of SEU Problem (Cont.) Summary of results from Proton Irradiation Studies – Observe tripping in V7.3.1 bricks and also the V6.5.4 bricks – Observe resetting of LT1681 chip in components board • Clocks stop for a short time, then restart • Conclusion: SEU in LT1681 initiates a soft-start – Rates ~same for all brick types, versions, and components board • LT1681 is common to all… Plot from Bob – From most recent tests, observe that when soft-start delay is made very short in V7.3.1 bricks, then No tripping is observed • i.e., brick trips, but soft-start restarts brick faster than our DAQ (and DCS) can measure • Measurement done with resistive load, not in drawer (yet) 6 Discussion of SEU Problem (Cont.) What is happening: Schematic Soft-start capacitor Value controls startup delay LT1681 LT1681 Block Diagram – – – – – Generally, the soft-start feature is used to restart circuit gracefully after a fault Not used this way in LVPS – No HDW automatic restart after a trip V6.5.4: Used to control startup sequence, ~0.1 – 0.8 Sec V7.3.1: Set to 30 mSec (Startup sequencing done another way…) Bricks will trip off after ~10 mSec due to dissipation of energy on primary side, which is why this feature does not work in this design This FF is being reset by protons 7 Discussion of SEU Problem (Cont.) Soft-Start Feature – When brick trips due to OVLO or TEMP or IMAX (or SEU), FF1 is SET • Causes clocks to stop • Causes voltage on CSS to be reset – CSS recharges from internal 10 uA current source • When reaches ~1.3V, clocks restart with reduced duty factor • When VSS reaches 2V, clocks fully restarted • Restart time depends on Css SEU Clocks restart when Vss reaches ~1.3V, Reduced DF CLK Clocks normal when Vss reaches ~2V 5V Decreasing CSS VSS FF1 VSS 2V CSS 0.225V + Discharge Time Depends lightly on Css Recharge Time Depends dominantly on Css Slope = 10 uA / Css Variable DF Time Depends dominantly on Css Affects Overshoot of Vo 8 Solutions to SEU Problem Under Study Eliminate the soft-start feature in the LT1681 – We have explored this, and it cannot be done – The soft-start feature is a basic function of the chip needed to start the brick Reduce the effect of SEU in the LT1681 – Work is in progress; 3 techniques being studied – Best so far: 200 uSec stoppage of clock – May have consequences for performance of front-end electronics • Causes an overshoot of output voltage from cold start • Will have droop of output voltage during 200 uSec dead-time Subject of discussion today Redesign the brick using a different controller chip – Have identified a different controller that does not have the soft-start feature – Major effort… • Prototypes; Testing in Building 175; Tests on detector; Radiation tests… • Would make production schedule for 2013 installation very tight… 9 Summary of Bench Studies Modification Scenario Soft-start feature cannot be disabled in the LT chip – It is an integral part of the operation of the device Soft-start delay affects how fast the brick starts up – CSS voltage modulates duty factor of the clock – Starts with low DF, and gradually increases – If soft-start delay is too short, then output voltage rises too fast, and can cause an overshoot – Amount of overshoot depends on load current & load capacitance If soft-start cap is removed, then are left with parasitic capacitance – Minimum delay, nominal 200 uSec, spread 150 uSec – 300 uSec When in soft-start mode, output voltage sags – Clocks stop, so switching stops – Amount of sag depends on load current and load capacitance 10 Summary of Current Approach to Problem Eliminate CSS Soft-Start Delay determined by parasitic capacitance – When SEU occurs • Brick trips off for 200 uSec, then restarts • Since 200 uSec << 10 mSec, brick restarts from soft-start operation – 2 caveats: • Results in fast start from cold-start, creating an overshoot on output voltage • Causes droop in output voltage for 200 uSec dead-time Output Voltage Overshoot On Cold-Start Simulated SEU Output Voltage Note: Overshoot lasts < ~5 mSec Soft Start Delay 11 Summary of Current Approach to Problem (Cont.) Addressing the issues – Overshoot • Current values with no Css shown in Table 1 • We are working on a way to reduce the overshoot through other means, but do not have a solution yet 12 Summary of Current Approach to Problem (Cont.) Addressing the issues (Cont.) – Droop – Add additional load capacitance to increase energy storage during dead-time • Current values with no Css shown in Table 2 • For a target of 10% droop (somewhat arbitrary), additional Cload values needed shown in Table 3 Cannot quite meet 10% spec for +5MB… Note: Have not repeated Overshoot Tests with additional CLOAD… 13 Discussion Points Component damage due to overshoot at cold start. Consider the magnitude of the peak value, and the duration of the overshoot before returning to nominal value. Pedestal effects due to the droop from SEU. False positive signals from the droop from SEU. Changes to gain or operating points from the droop from SEU. False or missing triggers from the droop from SEU. Component damage from the droop from SEU. Loss of timing synchronization from the droop from SEU. Loss of serial transmission synchronization from the droop from SEU. Loss of FPGA programming from the droop from SEU Others? Would like hard limits from FEE groups, for overshoot & droop Some optimization might be possible i.e. increased soft-start delay, a little more droop, less overshoot… 14