LAT CDR RFA #28 Response Action Requested: Describe the process for performing worse case circuit analyses for electronic circuits to show that the electronics can perform over its full temperature range over the life of the mission. Provide the results of these analyses, if you have performed these analyses. If you have not performed WCCA’s, then perform them. I suggest using NASA JPL preferred reliability practice PD-ED-1212 as a guideline. Supporting Rationale: This has not been described, although WCCA’s may have been performed. Response: WORST CASE CIRCUIT ANALYSIS RESULTS Gamma-ray Large Area Space Telescope (GLAST) Large Area Telescope (LAT) personnel performed a Worst Case Circuit Analysis (WCCA). The analysis was divided into three sections: Part Stress Analysis, Timing Analysis, and Failure Mode Effects Analysis. The WCCA uses the NASA JPL preferred reliability practice PD-ED-1212. This guideline summarizes its implementation method as: "Derive part parameter variations for the environments and life of a specific mission and combine them with the initial tolerances of the parts as procured to produce a worst case part variation database for each mission or project. Apply classical circuit analysis techniques, and determine if each circuit and each assembly meets its specified performance attributes over the most extreme but realizable combinations of part variation sources." The WCCA yielded no concerns for flight success. I. Part Stress Analysis Part Stress Analysis was performed on all LAT subsystems, either by GSFC (for the ACD) or SLAC (for the remainder). The reports are collected in a single database and they are listed in the part stress analysis column of the Worst Case Circuit Analysis Summary of Reports (Table 1). GSFC Code 300 contractors examined these part stress analyses. A small number of discrepancies were found. These discrepancies are summarized in the reports listed in the Part Stress Analysis Additional Notes at the bottom of Table 1. These issues were discrepancies like an incorrect derating factor or an applied value exceeds derated value. A GLAST parts engineer looked at each of these of discrepancies and determined that they do not require any changes in the LAT design. [***NEED TO GET THOM PERRY'S FINAL WORDS.***] II. Timing Analysis Timing analysis is usually performed via simulation to determine the ability to meet timing specifications on a circuit board and the ability to meet board-to-board timing specifications. However, none of the ASICs underwent timing analysis prior to fabrication. There was no simulation by the chip designers to verify that timing would be met over the mission's temperature and voltage ranges. Instead, the flight ASICs underwent Qualification Testing following fabrication consisting of Initial Tests, Life Cycle testing, Highly Accelerated Stress Testing and Temperature Cycle testing. The downside of this method of testing is it is impossible for these tests to recreate all of the conditions and combinations that simulation would have covered. The upside of this test methodology is a subset of the conditions and combinations are tested on the actual flight hardware rather than via simulation. Because the ASICs are integral to the LAT and ASICs did not undergo simulation, performing timing analysis of the LAT would have yielded highly incomplete results. One timing analysis was able to be performed at SLAC on the Tower Electronics Module looking at the key internal and external interfaces. The cognizant LAT engineers whose subsystems interface with the TEM verified that the interface timing is adequate and that their subsystems will operate within those constraints. In lieu of the ability to perform useful simulation timing analysis, we followed the recommendation of JPL's Preferred Reliability Practice number PT-TE-1431 by performing Voltage and Temperature Margin Testing (VTMT) on the LAT instrument subsystems and on the LAT as a whole. JPL's Preferred Reliability Practice number PTTE-1431, titled "Voltage & Temperature Margin Testing", states: "Voltage and Temperature Margin Testing (VTMT) is the practice of exceeding the expected flight limits of voltage, temperature, and frequency to simulate the worst case functional performance, including effects of radiation and operating life parameter variations on component parts. For programs subject to severe cost or schedule constraints, VTMT has proven an acceptable alternative to conventional techniques such as worst case analysis (WCA). WCA is the preferred approach to design reliability, but VTMT is a viable alternative for flight projects where trade-offs of risk versus development time and cost are appropriate…The VTMT must be designed to achieve adequate variation of circuit parameters through a judicious choice of voltage, temperature, and frequency margin combinations to achieve an optimal, yet realistic check of the design margin." All subsystems undergo thermal/vacuum testing which spans the temperature and input voltage ranges thereby performing the needed VTMT. Additionally, the entire LAT instrument will be undergoing thermal/vacuum testing which will test the electronics over the temperature and spacecraft interface voltage ranges. The reports that document the tests involved in this section are listed in the timing analysis column of the Worst Case Circuit Analysis summary of Reports (Table 1). For those portions where "T/V test results not yet available" is listed, the T/V test must be completed successfully for the subsystem to be integrated to LAT. For those portions where "et. al." is listed, only one of the usually 16 report numbers for the multiple instances of the electronics in LAT. III. Failure Mode Effects Analysis LAT personnel performed a Failure Mode Effects Analysis (FMEA) in an effort separate from the WCCA effort. As indicated in the failure mode effects analysis column of the Worst Case Circuit Analysis Summary of Reports (Table 1), the FMEA results are collected as LAT-TD-00374-D1 and the items that were considered can be found in the LAT DAQ FMEA Worksheets5_20.xls. The FMEA provides an assessment for the hardware configuration of the LAT. A “bottoms-up” examination of each LAT component was performed in order to identify potential failures and their effects on a local, instrument, and overall science mission level. Specific attention is given to identification of any Single Point Failures (SPFs) that could cause failure of the GLAST science mission, and to recommend corrective actions or methods to alleviate their occurrence and/or impact. This qualitative report will help answer these questions as each component of the LAT is analyzed: 1. How can the component fail? (It might be possible there is more than one mode of failure.) 2. What are the effects of the failure? 3. How critical are the effects? 4. How is the failure detected? 5. What are the safeguards against significant failures? A Critical Items List (CIL) is also be maintained as part of this activity. The CIL provides a summary of selected hardware items whose related failure modes can result in serious injury, loss of life (flight or ground personnel), loss of launch vehicle; or the loss of one or more mission objectives (when no redundancy exists) as defined by the GSFC project office. At present, one hardware item, the Micrometeoroid Shield (MMS), is a SPF item that remains on the Critical Items List (CIL). The MMS is classified with a severity classification of 2 (i.e., critical failure mode that could result in loss of one or more mission objectives). Its failure Mode would be light leakage/damage to two or more tiles. Its mission effect would be an inability to achieve the Diffuse Background Rejection Objective as defined in Table 2.3.1-1, Item 15, of the GLAST Science Requirements. Although the Micrometeoroid Shield is classified as a critical item, a supplemental rationale document provided by Steve Ritz/LAT Scientist (see Appendix B of PRA Report, LAT-TD-02510-02, dated 02/02/04) describes how the ACD and LAT are still capable of meeting science requirements (although efficiency may fall slightly below performance requirements, in the event a single tile, or localized pair, fails. The effect is localized inefficiency that would not severely impact the real-time operation of the LAT. Two additional potential Single Point Failures (SPFs), besides the Micrometeoroid Shield, have been recently identified. Presently they are being analyzed, and disposition will we decided shortly. The first potential additional SPF is the LAT Radiometer AntiFreeze Thermostat, where cold temperatures may cause system-wide failure of instrument. The second potential additional SPF lies in the ACD GARC (Digital) ASIC. During review of the ACD system, GARC ASIC failures were designated as being fault tolerant given the belief that two redundant power feeds were being provided. However, the implemented solution seems to involve wire-ORing of the clock signals. If this is the case, it is believed that a short on the clock signal could potentially take out the GARC ASIC and result in a SPF condition, given that all GARC ASICS needs to be operational in order to meet ACD efficiency requirements. Both of these potential SPFs are being further analyzed and recommendations will be forthcoming. Of the remaining 752 failure conditions that were assessed within the LAT instrument as part of the FMEA process; a summary of the current severity code rankings are as follows: * 106 failure conditions are of Category 2R (Critical), where failure mode is of identical or equivalent redundant hardware items that could result in Category 2 effects if all failed. (Category 2 is a failure mode that could result in loss of one or more mission objectives as defined by the GSFC project offce.) * 606 failure conditions are of Category 2MR (Critical), where the failure modes is of more than one of many identical or equivalent redundant hardware items that could result in Category 2 effects if a significant fraction of the identical systems fail. (Category 2 is a failure mode that could result in loss of one or more mission objectives as defined by the GSFC project offce.) * 28 failure conditions are of Category 3 (Significant), where the failure mode is one that could cause degradation to mission objectives * 12 failure conditions are of Category 4 (Minor), where the failure mode results in insignificant or no loss to mission objectives. Failures conditions with Category 2R, 2MR, 3, or 4 classifications are not catastrophic (Category 1) and are sufficiently mitigated via design or have an acceptable effect on mission objectives. TABLE 1 -- Worst Case Circuit Analysis Summary of Reports LAT subsystem AntiCoincidence Detector (ACD) unit electronics unit acronym Base Electronics BEA FRont End Electronics Assembly board board acronym FREE High Voltage Bias Supply HVBS part stress analysis ACD_FREE_Parts _Derating_06-0603A.xls Derating_Sheet_L AT_ACD_HVBS11 _5-30-03.xls LAT_TRACKER_P arts_stress.xls timing analysis VTMT on entire ACD. GAFE & GARC completed ASIC qualification testing. T/V test results not yet available. VTMT on entire MCM. GTFE & GTRC completed ASIC qualificiation testing. T/V test results LAT-TD05495 et. al. AFEE_Board_Stre VTMT on entire MCM. ss_Analysis_Rev_ GCRC & GCFE qual A1.xls complete. T/V test results LAT-PS-0512901 et. al. LAT-TD-01782LAT-TD-01785-02.pdf 02.pdf for a timing analysis. VTMT on entire LAT-TD-04516TEM/TPS. T/V test 02.pdf results LAT-TD-03631 et. al. LAT-TD-01823VTMT on entire 01.pdf GASU/GPS. GLTC, GTCC & GCCC completed ASIC LAT-TD-02655qualification testing. T/V 02.pdf test results not yet LAT-TD-01818SFC-Test52.pdf Procedures_RevD_2.4_ A.pdf for the Processor Board. VTMT on entire SIU. T/V test results not LAT-TD-01819yet available. 01.pdf Tracker TKR TKR Front-End Electronics MCM Calorimeter CAL CAL Front-End Electronics AFEE Storage Interface Board: * EEPROM storage * Spacecraft MIL1553 interface * Power control or PDU/GASU power switches in PDU * Power control or VCHP switches in heater box LAT Communication Board: * Commanding of GASU * Read-back, housekeeping & event data from GASU Processor Board SIB CPU 750_Reliability_As sessment_for_GLA ST_(12.1).doc, GLAST_Radiation_ and_Reliabiity.ppt, Power Supply Board: * 28V to 3.3V/5V conversion * LVDS-CMOS conversion of SC discretes * System clock to GASU Same components boards as SIU, except doesn’t perform: * SIB's spacecraft MIL1553 interface * SIB's power control or PDU/GASU power switches in PDU * SIB's power control or VCHP switches in heater box * PSB's LVDS-CMOS conversion of SC discretes PSB LAT-TD-0181701.pdf SIB, LCB, CPU, PSB same as SIU SFC-TestProcedures_RevD_2.4_ A.pdf for the Processor Board. VTMT on entire EPU. T/V test results not yet available. VTMT on entire PDU. T/V test results not yet available. VTMT on entire HCB. T/V test results not yet available. Data Acquisition (DAQ) Tower Electronics Module Tower Power Supply Module TEM TPS Global-Trigger / GASU ACD-EM / SignalDistribution Unit GASU Power Supply Spacecraft Interface Unit GPS SIU Event Processor EPU Unit LCB Power Distribution Unit PDU LAT-TD-0180903a_released.pdf Heater Control Box HCB LAT-TD-0457601.pdf failure mode effects analysis Results in LATTD-00374-D1 and individual items analyzed are found in LAT DAQ FMEA Worksheets5_20. xls Part Stress Analysis additional notes: Tony DiVenti had contractors Al Kogan and Paula Simon audit SLAC's part stress analysis. Their analysis is contained in derating_09-13-04.doc, GLAST_LAT_Parts_Derating-TPS_rev_A.doc, GLAST_RAD750_Parts_Derating.doc. Timing Analysis additional notes: For those portions where "T/V test results not yet available" is listed, the T/V test must be completed successfully for the subsystem to be integrated to LAT. For those portions where "et. al." is listed, only one of the usually 16 report numbers for the multiple instances of the electronics in LAT.