LAT CDR RFA #28 Response
Action Requested:
Describe the process for performing worse case circuit analyses for electronic circuits to show that the electronics can perform over its full temperature range over the life of the mission. Provide the results of these analyses, if you have performed these analyses. If you have not performed WCCA’s, then perform them. I suggest using NASA JPL preferred reliability practice PD-ED-1212 as a guideline.
Supporting Rationale:
This has not been described, although WCCA’s may have been performed.
Response:
WORST CASE CIRCUIT ANALYSIS RESULTS
Gamma-ray Large Area Space Telescope (GLAST) Large Area Telescope (LAT) personnel performed a Worst Case Circuit Analysis (WCCA). The analysis was divided into three sections: Part Stress Analysis, Timing Analysis, and Failure Mode Effects
Analysis. The WCCA uses the NASA JPL preferred reliability practice PD-ED-1212.
This guideline summarizes its implementation method as: "Derive part parameter variations for the environments and life of a specific mission and combine them with the initial tolerances of the parts as procured to produce a worst case part variation database for each mission or project. Apply classical circuit analysis techniques, and determine if each circuit and each assembly meets its specified performance attributes over the most extreme but realizable combinations of part variation sources."
The WCCA yielded no concerns for flight success.
I. Part Stress Analysis
Part Stress Analysis was performed on all LAT subsystems, either by GSFC (for the
ACD) or SLAC (for the remainder). The reports are collected in a single database and they are listed in the part stress analysis column of the Worst Case Circuit Analysis
Summary of Reports (Table 1).
GSFC Code 300 contractors examined these part stress analyses. A small number of items requiring further analysis were identified. These items are summarized in the reports listed in the Part Stress Analysis Additional Notes at the bottom of Table 1.
These items were discrepancies like an incorrect derating factor or an applied value exceeds derated value. The majority of these items have been resolved and deemed to be of no concern. The remaining items are presently being analyzed. Any issues that arise will be brought to the attention of GLAST Project Management, the GLAST Systems
Assurance Manager, and the RFA originator
II. Timing Analysis
Timing analysis is usually performed via simulation to determine the ability to meet timing specifications on a circuit board and the ability to meet board-to-board timing specifications. However, none of the ASICs underwent timing analysis prior to fabrication. There was no simulation by the chip designers to verify that timing would be met over the mission's temperature and voltage ranges. Instead, the flight ASICs underwent Qualification Testing following fabrication consisting of Initial Tests, Life
Cycle testing, Highly Accelerated Stress Testing and Temperature Cycle testing. The downside of this method of testing is it is impossible for these tests to recreate all of the conditions and combinations that simulation would have covered. The upside of this test methodology is a subset of the conditions and combinations are tested on the actual flight hardware rather than via simulation.
Because the ASICs are integral to the LAT and ASICs did not undergo simulation, performing timing analysis of the LAT would have yielded highly incomplete results.
One timing analysis was able to be performed at SLAC on the Tower Electronics
Module. That timing analysis looked at the key TEM internal and external interfaces.
The cognizant LAT engineers whose subsystems interface with the TEM verified that the interface timing is adequate and that their subsystems will operate within those constraints.
In lieu of the ability to perform useful simulation timing analysis, we followed the recommendation of JPL's Preferred Reliability Practice number PT-TE-1431 by performing Voltage and Temperature Margin Testing (VTMT) on the LAT instrument subsystems and on the LAT as a whole. JPL's Preferred Reliability Practice number PT-
TE-1431, titled "Voltage & Temperature Margin Testing", states: "Voltage and
Temperature Margin Testing (VTMT) is the practice of exceeding the expected flight limits of voltage, temperature, and frequency to simulate the worst case functional performance, including effects of radiation and operating life parameter variations on component parts. For programs subject to severe cost or schedule constraints, VTMT has proven an acceptable alternative to conventional techniques such as worst case analysis
(WCA). WCA is the preferred approach to design reliability, but VTMT is a viable alternative for flight projects where trade-offs of risk versus development time and cost are appropriate…The VTMT must be designed to achieve adequate variation of circuit
parameters through a judicious choice of voltage, temperature, and frequency margin combinations to achieve an optimal, yet realistic check of the design margin."
All subsystems undergo thermal/vacuum testing which spans the temperature and input voltage ranges thereby performing the needed VTMT. Additionally, the entire LAT instrument will be undergoing thermal/vacuum testing which will test the electronics over the temperature and spacecraft interface voltage ranges.
The reports that document the tests involved in this section are listed in the timing analysis column of the Worst Case Circuit Analysis summary of Reports (Table 1). For those portions where "T/V test results not yet available" is listed, the T/V test must be completed successfully for the subsystem to be integrated to LAT. For those portions where "et. al." is listed, only one of the usually 16 report numbers is listed for the multiple instances of the electronics in LAT.
III. FAILURE MODE EFFECTS ANALYSIS
LAT personnel performed a Failure Mode Effects Analysis (FMEA) in an effort separate from the WCCA effort. As indicated in the failure mode effects analysis column of the
Worst Case Circuit Analysis Summary of Reports (Table 1), the FMEA results are collected as LAT-TD-00374-D1 and the items that were considered can be found in the
LAT DAQ FMEA Worksheets5_20.xls.
The FMEA provides an assessment for the hardware configuration of the LAT. A
“bottoms-up” examination of each LAT component was performed in order to identify potential failures and their effects on a local, instrument, and overall science mission level. Specific attention is given to identification of any Single Point Failures (SPFs) that could cause failure of the GLAST science mission, and to recommend corrective actions or methods to alleviate their occurrence and/or impact.
This qualitative report will help answer these questions as each component of the LAT is analyzed:
1. How can the component fail? (It might be possible there is more than one mode of failure.)
2. What are the effects of the failure?
3.
4.
5.
How critical are the effects?
How is the failure detected?
What are the safeguards against significant failures?
A Critical Items List (CIL) is also be maintained as part of this activity. The CIL provides a summary of selected hardware items whose related failure modes can result in serious injury, loss of life (flight or ground personnel), loss of launch vehicle; or the loss of one or more mission objectives (when no redundancy exists) as defined by the GSFC project office.
At present, one hardware item, the Micrometeoroid Shield (MMS), is a SPF item that remains on the Critical Items List (CIL). The MMS is classified with a severity classification of 2 (i.e., critical failure mode that could result in loss of one or more mission objectives). Its failure mode would be light leakage/damage to two or more tiles.
Its mission effect would be an inability to achieve the Diffuse Background Rejection
Objective as defined in Table 2.3.1-1, Item 15, of the GLAST Science Requirements.
Although the Micrometeoroid Shield is classified as a critical item, a supplemental rationale document provided by Steve Ritz/LAT Scientist (see Appendix B of PRA
Report, LAT-TD-02510-02, dated 02/02/04) describes how the ACD and LAT are still capable of meeting science requirements (although efficiency may fall slightly below performance requirements, in the event a single tile, or localized pair, fails. The effect is localized inefficiency that would not severely impact the real-time operation of the LAT.
Of the remaining 752 failure conditions that were assessed within the LAT instrument as part of the FMEA process; a summary of the current severity code rankings are as follows:
* 106 failure conditions are of Category 2R (Critical), where failure mode is of identical or equivalent redundant hardware items that could result in Category 2 effects if all failed. (Category 2 is a failure mode that could result in loss of one or more mission objectives as defined by the GSFC project offce.)
* 606 failure conditions are of Category 2MR (Critical), where the failure modes is of more than one of many identical or equivalent redundant hardware items that could result in Category 2 effects if a significant fraction of the identical systems fail. (Category 2 is a failure mode that could result in loss of one or more mission objectives as defined by the
GSFC project offce.)
* 28 failure conditions are of Category 3 (Significant), where the failure mode is one that could cause degradation to mission objectives
* 12 failure conditions are of Category 4 (Minor), where the failure mode results in insignificant or no loss to mission objectives.
Failures conditions with Category 2R, 2MR, 3, or 4 classification are not catastrophic
(Category 1) and are sufficiently mitigated via design or have an acceptable effect on mission objectives.
LAT subsystem
Anti-
Coincidence
Detector
(ACD)
Tracker
Calorimeter
Data
Acquisition
(DAQ) electronics unit
Base Electronics
Assembly
Tower
Electronics
Module
Tower Power
Supply Module unit
TKR
TABLE 1 -- Worst Case Circuit Analysis Summary of Reports acronym
BEA FRont End Electronics
High Voltage Bias Supply
TKR Front-End Electronics board board acronym
FREE
HVBS
MCM part stress analysis
ACD_FREE_Parts
_Derating_06-06-
03A.xls
Derating_Sheet_L
AT_ACD_HVBS11
_5-30-03.xls
LAT_TRACKER_P arts_stress.xls
timing analysis
VTMT on entire ACD.
GAFE & GARC completed ASIC failure mode effects analysis
Results in LAT-
TD-00374-D1 and individual qualification testing. T/V items analyzed test results not yet available.
VTMT on entire MCM.
GTFE & GTRC are found in LAT
DAQ FMEA
Worksheets5_20.
xls
CAL
TEM
TPS
Global-Trigger /
ACD-EM / Signal-
Distribution Unit
GASU
GASU Power
Supply
Spacecraft
Interface Unit
GPS
SIU
CAL Front-End Electronics
Storage Interface Board:
* EEPROM storage
* Spacecraft MIL1553 interface
* Power control or PDU/GASU power switches in PDU
* Power control or VCHP switches in heater box
LAT Communication Board:
* Commanding of GASU
* Read-back, housekeeping & event data from GASU
Processor Board
AFEE
SIB
LCB
CPU
AFEE_Board_Stre ss_Analysis_Rev_
A1.xls
LAT-TD-01782-
02.pdf
LAT-TD-04516-
02.pdf
LAT-TD-01823-
01.pdf
LAT-TD-02655-
02.pdf
LAT-TD-01818-
52.pdf
LAT-TD-01819-
01.pdf
completed ASIC qualificiation testing. T/V test results LAT-TD-
05495 et. al.
VTMT on entire MCM.
GCRC & GCFE qual complete. T/V test results LAT-PS-05129-
01 et. al.
LAT-TD-01785-02.pdf for a timing analysis.
VTMT on entire
TEM/TPS. T/V test results LAT-TD-03631 et. al.
VTMT on entire
GASU/GPS. GLTC,
GTCC & GCCC completed ASIC qualification testing. T/V test results not yet
SFC-Test-
Procedures_RevD_2.4_
A.pdf for the Processor
Board. VTMT on entire
SIU. T/V test results not yet available.
750_Reliability_As sessment_for_GLA
ST_(12.1).doc,
GLAST_Radiation_ and_Reliabiity.ppt,
Event Processor
Unit
EPU
Power Supply Board:
* 28V to 3.3V/5V conversion
* LVDS-CMOS conversion of SC discretes
* System clock to GASU
Same components boards as SIU, except doesn’t perform:
* SIB's spacecraft MIL1553 interface
* SIB's power control or PDU/GASU power switches in PDU
* SIB's power control or VCHP switches in heater box
* PSB's LVDS-CMOS conversion of SC discretes
PSB
SIB,
LCB,
CPU,
PSB
LAT-TD-01817-
01.pdf
same as SIU SFC-Test-
Procedures_RevD_2.4_
A.pdf for the Processor
Board. VTMT on entire
EPU. T/V test results not yet available.
Power Distribution Unit
Heater Control Box
PDU
HCB
LAT-TD-01809-
03a_released.pdf
LAT-TD-04576-
01.pdf
VTMT on entire PDU.
T/V test results not yet available.
VTMT on entire HCB.
T/V test results not yet available.
Part Stress Analysis additional notes: Tony DiVenti had contractors Al Kogan and Paula Simon audit SLAC's part stress analysis. Their analysis is contained in derating_09-13-04.doc, GLAST_LAT_Parts_Derating-TPS_rev_A.doc, GLAST_RAD750_Parts_Derating.doc.
Timing Analysis additional notes: For those portions where "T/V test results not yet available" is listed, the T/V test must be completed successfully for the subsystem to be integrated to LAT. For those portions where "et. al." is listed, only one of the usually 16 report numbers for the multiple instances of the electronics in LAT.