LAT CDR RFA #28 Response Action Requested:

advertisement
LAT CDR RFA #28 Response
Action Requested:
Describe the process for performing worse case circuit analyses for electronic circuits to
show that the electronics can perform over its full temperature range over the life of the
mission. Provide the results of these analyses, if you have performed these analyses. If
you have not performed WCCA’s, then perform them. I suggest using NASA JPL
preferred reliability practice PD-ED-1212 as a guideline.
Supporting Rationale:
This has not been described, although WCCA’s may have been performed.
Response:
WORST CASE CIRCUIT ANALYSIS RESULTS
Gamma-ray Large Area Space Telescope (GLAST) Large Area Telescope (LAT)
personnel performed a Worst Case Circuit Analysis (WCCA). The analysis was divided
into three sections: Part Stress Analysis, Timing Analysis, and Failure Mode Effects
Analysis. The WCCA uses the NASA JPL preferred reliability practice PD-ED-1212.
This guideline summarizes its implementation method as: "Derive part parameter
variations for the environments and life of a specific mission and combine them with the
initial tolerances of the parts as procured to produce a worst case part variation database
for each mission or project. Apply classical circuit analysis techniques, and determine if
each circuit and each assembly meets its specified performance attributes over the most
extreme but realizable combinations of part variation sources."
The WCCA yielded no concerns for flight success.
I. Part Stress Analysis
Part Stress Analysis was performed on all LAT subsystems, either by GSFC (for the
ACD) or SLAC (for the remainder). The reports are collected in a single database and
they are listed in the part stress analysis column of the Worst Case Circuit Analysis
Summary of Reports (Table 1).
GSFC Code 300 contractors examined these part stress analyses. A small number of
discrepancies were found. These discrepancies are summarized in the reports listed in the
Part Stress Analysis Additional Notes at the bottom of Table 1. These issues were
discrepancies like an incorrect derating factor or an applied value exceeds derated value.
A GLAST parts engineer looked at each of these of discrepancies and determined that
they do not require any changes in the LAT design. [***NEED TO GET THOM
PERRY'S FINAL WORDS.***]
II. Timing Analysis
Timing analysis is usually performed via simulation to determine the ability to meet
timing specifications on a circuit board and the ability to meet board-to-board timing
specifications. However, none of the ASICs underwent timing analysis prior to
fabrication. There was no simulation by the chip designers to verify that timing would be
met over the mission's temperature and voltage ranges. Instead, the flight ASICs
underwent Qualification Testing following fabrication consisting of Initial Tests, Life
Cycle testing, Highly Accelerated Stress Testing and Temperature Cycle testing. The
downside of this method of testing is it is impossible for these tests to recreate all of the
conditions and combinations that simulation would have covered. The upside of this test
methodology is a subset of the conditions and combinations are tested on the actual flight
hardware rather than via simulation.
Because the ASICs are integral to the LAT and ASICs did not undergo simulation,
performing timing analysis of the LAT would have yielded highly incomplete results.
One timing analysis was able to be performed at SLAC on the Tower Electronics Module
looking at the key internal and external interfaces. The cognizant LAT engineers whose
subsystems interface with the TEM verified that the interface timing is adequate and that
their subsystems will operate within those constraints.
In lieu of the ability to perform useful simulation timing analysis, we followed the
recommendation of JPL's Preferred Reliability Practice number PT-TE-1431 by
performing Voltage and Temperature Margin Testing (VTMT) on the LAT instrument
subsystems and on the LAT as a whole. JPL's Preferred Reliability Practice number PTTE-1431, titled "Voltage & Temperature Margin Testing", states: "Voltage and
Temperature Margin Testing (VTMT) is the practice of exceeding the expected flight
limits of voltage, temperature, and frequency to simulate the worst case functional
performance, including effects of radiation and operating life parameter variations on
component parts. For programs subject to severe cost or schedule constraints, VTMT has
proven an acceptable alternative to conventional techniques such as worst case analysis
(WCA). WCA is the preferred approach to design reliability, but VTMT is a viable
alternative for flight projects where trade-offs of risk versus development time and cost
are appropriate…The VTMT must be designed to achieve adequate variation of circuit
parameters through a judicious choice of voltage, temperature, and frequency margin
combinations to achieve an optimal, yet realistic check of the design margin."
All subsystems undergo thermal/vacuum testing which spans the temperature and input
voltage ranges thereby performing the needed VTMT. Additionally, the entire LAT
instrument will be undergoing thermal/vacuum testing which will test the electronics over
the temperature and spacecraft interface voltage ranges.
The reports that document the tests involved in this section are listed in the timing
analysis column of the Worst Case Circuit Analysis summary of Reports (Table 1). For
those portions where "T/V test results not yet available" is listed, the T/V test must be
completed successfully for the subsystem to be integrated to LAT. For those portions
where "et. al." is listed, only one of the usually 16 report numbers for the multiple
instances of the electronics in LAT.
III. Failure Mode Effects Analysis
LAT personnel performed a Failure Mode Effects Analysis (FMEA) in an effort separate
from the WCCA effort. As indicated in the failure mode effects analysis column of the
Worst Case Circuit Analysis Summary of Reports (Table 1), the FMEA results are
collected as LAT-TD-00374-D1 and the items that were considered can be found in the
LAT DAQ FMEA Worksheets5_20.xls.
The FMEA provides an assessment for the hardware configuration of the LAT. A
“bottoms-up” examination of each LAT component was performed in order to identify
potential failures and their effects on a local, instrument, and overall science mission
level. Specific attention is given to identification of any Single Point Failures (SPFs) that
could cause failure of the GLAST science mission, and to recommend corrective actions
or methods to alleviate their occurrence and/or impact.
This qualitative report will help answer these questions as each component of the LAT is
analyzed:
1.
How can the component fail? (It might be possible there is more than one mode
of failure.)
2.
What are the effects of the failure?
3.
How critical are the effects?
4.
How is the failure detected?
5.
What are the safeguards against significant failures?
A Critical Items List (CIL) is also be maintained as part of this activity. The CIL
provides a summary of selected hardware items whose related failure modes can result in
serious injury, loss of life (flight or ground personnel), loss of launch vehicle; or the loss
of one or more mission objectives (when no redundancy exists) as defined by the GSFC
project office.
At present, one hardware item, the Micrometeoroid Shield (MMS), is a SPF item that
remains on the Critical Items List (CIL). The MMS is classified with a severity
classification of 2 (i.e., critical failure mode that could result in loss of one or more
mission objectives). Its failure Mode would be light leakage/damage to two or more tiles.
Its mission effect would be an inability to achieve the Diffuse Background Rejection
Objective as defined in Table 2.3.1-1, Item 15, of the GLAST Science Requirements.
Although the Micrometeoroid Shield is classified as a critical item, a supplemental
rationale document provided by Steve Ritz/LAT Scientist (see Appendix B of PRA
Report, LAT-TD-02510-02, dated 02/02/04) describes how the ACD and LAT are still
capable of meeting science requirements (although efficiency may fall slightly below
performance requirements, in the event a single tile, or localized pair, fails. The effect is
localized inefficiency that would not severely impact the real-time operation of the LAT.
Two additional potential Single Point Failures (SPFs), besides the Micrometeoroid
Shield, have been recently identified. Presently they are being analyzed, and disposition
will we decided shortly. The first potential additional SPF is the LAT Radiometer AntiFreeze Thermostat, where cold temperatures may cause system-wide failure of
instrument. The second potential additional SPF lies in the ACD GARC (Digital) ASIC.
During review of the ACD system, GARC ASIC failures were designated as being fault
tolerant given the belief that two redundant power feeds were being provided. However,
the implemented solution seems to involve wire-ORing of the clock signals. If this is the
case, it is believed that a short on the clock signal could potentially take out the GARC
ASIC and result in a SPF condition, given that all GARC ASICS needs to be operational
in order to meet ACD efficiency requirements. Both of these potential SPFs are being
further analyzed and recommendations will be forthcoming.
Of the remaining 752 failure conditions that were assessed within the LAT instrument as
part of the FMEA process; a summary of the current severity code rankings are as
follows:
* 106 failure conditions are of Category 2R (Critical), where failure mode is of identical
or equivalent redundant hardware items that could result in Category 2 effects if all
failed. (Category 2 is a failure mode that could result in loss of one or more mission
objectives as defined by the GSFC project offce.)
* 606 failure conditions are of Category 2MR (Critical), where the failure modes is of
more than one of many identical or equivalent redundant hardware items that could result
in Category 2 effects if a significant fraction of the identical systems fail. (Category 2 is a
failure mode that could result in loss of one or more mission objectives as defined by the
GSFC project offce.)
* 28 failure conditions are of Category 3 (Significant), where the failure mode is one that
could cause degradation to mission objectives
* 12 failure conditions are of Category 4 (Minor), where the failure mode results in
insignificant or no loss to mission objectives.
Failures conditions with Category 2R, 2MR, 3, or 4 classifications are not catastrophic
(Category 1) and are sufficiently mitigated via design or have an acceptable effect on
mission objectives.
TABLE 1 -- Worst Case Circuit Analysis Summary of Reports
LAT
subsystem
AntiCoincidence
Detector
(ACD)
unit
electronics unit acronym
Base Electronics BEA
FRont End Electronics
Assembly
board
board
acronym
FREE
High Voltage Bias Supply
HVBS
part stress
analysis
ACD_FREE_Parts
_Derating_06-0603A.xls
Derating_Sheet_L
AT_ACD_HVBS11
_5-30-03.xls
LAT_TRACKER_P
arts_stress.xls
timing analysis
VTMT on entire ACD.
GAFE & GARC
completed ASIC
qualification testing. T/V
test results not yet
available.
VTMT on entire MCM.
GTFE & GTRC
completed ASIC
qualificiation testing. T/V
test results LAT-TD05495 et. al.
AFEE_Board_Stre VTMT on entire MCM.
ss_Analysis_Rev_ GCRC & GCFE qual
A1.xls
complete. T/V test
results LAT-PS-0512901 et. al.
LAT-TD-01782LAT-TD-01785-02.pdf
02.pdf
for a timing analysis.
VTMT on entire
LAT-TD-04516TEM/TPS. T/V test
02.pdf
results LAT-TD-03631
et. al.
LAT-TD-01823VTMT on entire
01.pdf
GASU/GPS. GLTC,
GTCC & GCCC
completed ASIC
LAT-TD-02655qualification testing. T/V
02.pdf
test results not yet
LAT-TD-01818SFC-Test52.pdf
Procedures_RevD_2.4_
A.pdf for the Processor
Board. VTMT on entire
SIU. T/V test results not
LAT-TD-01819yet available.
01.pdf
Tracker
TKR
TKR Front-End Electronics
MCM
Calorimeter
CAL
CAL Front-End Electronics
AFEE
Storage Interface Board:
* EEPROM storage
* Spacecraft MIL1553 interface
* Power control or PDU/GASU power switches in PDU
* Power control or VCHP switches in heater box
LAT Communication Board:
* Commanding of GASU
* Read-back, housekeeping & event data from GASU
Processor Board
SIB
CPU
750_Reliability_As
sessment_for_GLA
ST_(12.1).doc,
GLAST_Radiation_
and_Reliabiity.ppt,
Power Supply Board:
* 28V to 3.3V/5V conversion
* LVDS-CMOS conversion of SC discretes
* System clock to GASU
Same components boards as SIU, except doesn’t perform:
* SIB's spacecraft MIL1553 interface
* SIB's power control or PDU/GASU power switches in PDU
* SIB's power control or VCHP switches in heater box
* PSB's LVDS-CMOS conversion of SC discretes
PSB
LAT-TD-0181701.pdf
SIB,
LCB,
CPU,
PSB
same as SIU
SFC-TestProcedures_RevD_2.4_
A.pdf for the Processor
Board. VTMT on entire
EPU. T/V test results
not yet available.
VTMT on entire PDU.
T/V test results not yet
available.
VTMT on entire HCB.
T/V test results not yet
available.
Data
Acquisition
(DAQ)
Tower
Electronics
Module
Tower Power
Supply Module
TEM
TPS
Global-Trigger / GASU
ACD-EM / SignalDistribution Unit
GASU Power
Supply
Spacecraft
Interface Unit
GPS
SIU
Event Processor EPU
Unit
LCB
Power Distribution Unit
PDU
LAT-TD-0180903a_released.pdf
Heater Control Box
HCB
LAT-TD-0457601.pdf
failure mode
effects analysis
Results in LATTD-00374-D1
and individual
items analyzed
are found in LAT
DAQ FMEA
Worksheets5_20.
xls
Part Stress Analysis additional notes: Tony DiVenti had contractors Al Kogan and Paula Simon audit SLAC's part stress analysis. Their analysis is
contained in derating_09-13-04.doc, GLAST_LAT_Parts_Derating-TPS_rev_A.doc, GLAST_RAD750_Parts_Derating.doc.
Timing Analysis additional notes: For those portions where "T/V test results not yet available" is listed, the T/V test must be completed successfully for
the subsystem to be integrated to LAT. For those portions where "et. al." is listed, only one of the usually 16 report numbers for the multiple instances of
the electronics in LAT.
Download