14 May 2002 Meeting Minutes SRE Space Systems Reliability Tools Standards Working Group The 4th meeting of the Space Systems Reliability Tools Standards Working Group was held on Tuesday, May 14, 2002, from 8:30 AM to 11:30 AM PDT. The meeting consisted of two separate teleconferences. One teleconference was mediated at The Aerospace Corporation in El Segundo, CA, and the other teleconference was mediated at DSI International, in Orange, CA. The meeting agenda is on page 3. The objective of the SSRT Standards WG is to develop a commercial standard that provides a single framework for linking different reliability assessment tools. This framework shall be built by defining critical process addresses 1 and standard formats for all data elements used in appropriate identification, analysis, and verification of Reliability, Maintainability, and Availability (RMA) requirements for space systems. In the context of this standard, “appropriate identification, analysis, and verification…” means there would be negligible risk of adverse effects from using the results. The title of the standard shall be, “Standard Format for Space System Reliability Computer Applications,” and its scheduled completion date is 30 September 2002. The WG is organized into two teams. Team 1 is tasked with defining the data elements and their critical process addresses. The members of Team 1 are all Reliability Engineering experts and their lead is Tyrone Jackson. Team 2 is tasked with defining the standard formats for the data elements. The members of Team 2 are all reliability tool developers and their lead is Dan Hartop. Participants at the May 14th meeting were: NAME Steve Harbater Dan Hartop(2) Jim Sketoe Al Jackson Tyrone Jackson(1) Xuegao (David) Walt Willing (1) (2) COMPANY TRW DSI Intl. Boeing CSULB Eng Grad College Aerospace Corp. SoHar Inc. Northrop Grumman PHONE 858-592-3490 714-637-9325 253-773-2891 310-493-7469 310-336-6170 323-653-4717 410-765-7372 E-MAIL Steve.Harbater@trw.com dhartop@dsiintl.com James.E.Sketoe@boeing.com jacksona@simanima.com Tyrone.Jackson@aero.org Xuegao@sohar.com walter_e_willing@md.northgrum.com Meeting coordinator and Team 1 lead Team 2 lead The critical process addresses may be defined using machine-readable alphanumeric symbols or humanreadable Extended Machine Language (XML) keywords. 1 1 The following individuals are on regular distribution for the SSRT Standards WG Meeting minutes: NAME Mike Canga J C Cantrell Terry Kinney Robert Poltz Kamran Nouri James Womack John Ingram-Cotton Dave Dylis Eric Gould Jim Kallis Bill Geimer Leo F. Watkins Marios Savva Adamantios Mettas Doug Ogden Rich Pugh Ken Murphy Chuck Anderson Myron Hecht Rebecca Menes Bob Miller Kevin P. Van Fleet Hunter Shaw Clarence Meese COMPANY NASA JSC Aerospace Corp. Spectrum Astro Design Analytx Item Software Aerospace Corp. Aerospace Corp. RAC DSI Intl. Raytheon Northrop Grumman Lockheed Martin Reliasoft ReliaSoft ReliaSoft Pratt Whitney ARINC GRC International Sohar Inc. Sohar Inc. TRW Relex Software Relex Software SRE PHONE 281-483-5395 310-336-2899 719-550-0325 877-327-7550 714-935-2900 310-336-7647 310-336-1249 315-339-7055 714-637-9325 310-647-3620 626-812-2783 817-935-4452 520-886-0410 520-886-0366 Ext. 29 520-886-0366 Ext. 41 505-248-0640 281-483-4087 323-653-4717X111 323-653-4717X101 310-812-2840 724-836-8800 x105 724-836-8800 2 E-MAIL michael.a.canga1@jsc.nasa.gov John.C.Cantrell@Aero.org Terry.Kinney@specastro.com getreliability@designanalytx.com kamran@itemsoft.com James.M.Womack@aero.org John.Ingram-Cotton@aero.org DDylis@IITRI.ORG egould@dsiintl.com jmkallis@west.raytheon.com William.Geimer@northropgrumman.com Leo.F.Watkins@LMCO.com Marios.Savva@reliasoft.com Adamantios.Mettas@ReliaSoft.com Doug.Ogden@ReliaSoft.com pugh@pwfl.com KMURPHY@arinc.com charlton.r.anderson1@jsc.nasa.gov Myron@sohar.com Becky@sohar.com Robert.Miller@trw.com kevin.vanfleet@relexsoftware.com Hunter.Shaw@relexsoftware.com cmeese@nyx.net May 14th Meeting Agenda Time SRE Working Group Administrative Topics 8:30 - 8:40 PDT Take roll Vote to approve the minutes of the April 30th meeting Remind participants to pay their SRE membership dues Time Team 1 Discussion Topics 8:40 - 9:00 PDT Discuss Status of Action Items from the April 30th Meeting – Tyrone Jackson 9:00 - 9:20 PDT Discuss the Electrical Stress Derating Analysis Flow Diagrams – Steve Harbater 9:20 - 9:50 PDT Discuss the Reliability Prediction Process Flow Diagram for the Preliminary Design Phase – Jim Sketoe 9:50 - 10:00 PDT Break 10:00 - 10:45 PDT Discuss the First-Cut Standard Formats for Reliability Data – Tyrone Jackson Team 2 Discussion Topics 8:30 - 9:50 PDT Discuss Status of Action Items from the April 30th Meeting – Dan Hartop 9:50 - 10:00 PDT Break 10:00 - 10:45 PDT Begin Developing a Draft Outline for the Standard, which is titled, “Standard Format for Space System Reliability Computer Applications” – Dan Hartop Team 1 & 2 Summary 10:45 - 11:30 PDT Summary and Review of Actions Items – All 11:30 PDT Meeting Adjourn 3 Team 1 Discussion Topics Team 1 participants in the April 30th meeting were: Tyrone Jackson (Team Lead) Steve Harbarter Walt Willing Jim Sketeo The group did not meet the minimum number of participants required for a Team 1 quorum and decided to postpone the vote on approval of the April 30 th meeting minutes until the next scheduled meeting on May 28 th. The group agreed that Visio 2000 diagrams should be converted to Visio 5 format before distribution to the working group for review. The group reviewed Steve’s Stress Electrical Derating Process Flow Diagram and accompanying write-up. Steve mentioned that sometimes the secondary parameters are not included in the stress derating analysis to save money. Tyrone volunteered to develop draft definitions for some of the electrical stress derating parameters. He plans on using the Fortran source code for an old MILHDBK-217 program to build a list of component-specific derated parameters. The group reviewed Jim’s Reliability Prediction Process Diagram for the Preliminary Design Phase. The group agreed that unit level and component level trade studies are often performed during the Preliminary Design Phase. Therefore, the use of reliability data to support trade studies should be added to Reliability Prediction Process Diagram. Jim will modify the diagram. The group discussed the widespread trend away from piece part FMECA. Walt said that, at a minimum, FMECA should be performed to identify the effects of failures at the interfaces of a Line Replaceable Unit (LRU). He added that identifying internal failure modes of an existing LRU would not be efficient use of an analyst’s time, but identifying internal failure modes of a new or modified LRU would be efficient use of an analyst’s time. The group agreed with Walt. The group agreed that FMECA should be used to validate the Reliability Block Diagram (RBD), and both the FMECA and RBD should begin at the same level of indenture. The group agreed on the following concepts: o In an ideal world, where tools are available to apply all reliability methods with equal effort to all items, the preferred order of reliability methods would be: 1. Field data 4 2. Test data 3. Physics of failure (PoF) equations if they were derived from applicable test data 4. Handbook reliability prediction equations if they were derived from applicable field data o The MTBF calculation for COTS should be based on either field data or test data. o In the real world (at least for now), handbook reliability prediction methods are the most cost effective choice for MTBF calculations because: Insufficient field and test data is available for all items in modern space systems. A key goal of the Responsible Design Engineer (RDE) should be to eliminate all wearout mechanisms that can affect mission success. Therefore, PoF would not be necessary if this goal is met. Cost effective PoF tools are not available. o Some of the problems associated with handbook reliability prediction methods include: Use of proprietary parameters Failure rate equations that were not derived from field data Unknown confidence bonds for calculated failure rates Assumed exponential (constant) failure rates for all items Lack of a comprehensive set of hazard rate equations for nonelectronic parts Lack of a comprehensive set of non-operating failure rate equations for electronic parts Tyrone discussed an example for a standard reliability data format that he derived from the old B1 and B2 sheets in MIL-STD-1388-A. The example consists of predefined keywords that have origination points identified on critical process flow diagrams. The points on the diagrams serve as data addresses. To allow consistent identification of the data by different reliability assessment tools, the keywords are arranged in an indentured configuration that is based on data dependency. Take for example, a spacecraft Mean Mission Duration 5 (MMD) prediction. Its standard electronic data interchange format might look something like this: RELIABILITY PREDICTION MMD RWEIBULL (Rayleigh-Truncated Weibull) SCALE = 60.0 SHAPE = 1.75 BWEAROUT (Begin Wearout) = 36 MWEAROUT (Mean Wearout) = 48 CONFIDENCE = 0.5 UNITS = MONTHS Team 2 Discussion Topics Team 2 participants in the May 14th meeting were: Dan Hartop (Team Lead) David Xuegao Al Jackson The group met the minimum number of participants required for a Team 2 quorum. The following tasks have been completed: o Created, Updated and Reviewed Initial Schema o Documented Updated Schema Considerations for review by Team 2 o Discussed potential Interoperability paths and approach As a side note, DSI will ultimately create an XSL style sheet (just a fancy XML document for automatically changing XML into something useful) for converting a Fault Tree XML (FTML ?) document into an Excel XML Spreadsheet (supported by Excel 2002). This will be accomplished sometime over the next few months at DSI's availability. Therefore, we will commit to an Action Item that will not have a definite date other than by September 2002. Team 2 Future Agenda o Team 2 - Complete review of Gate Types, ensure consistent parsing for existing tools o Team 2 - Define interoperability paths for Fault Tree and other Schemas o Team 1 - Provide input to Team 2 regarding current schema 6 Action Items 1. Team 1 Action Items – a. All – Review the updated Fault Tree Schema that Team 2 constructed. Specifically, check for correctness, completeness, and compliance with the stated objective of standard (see page 1). b. Jim – Update the diagram for the Reliability Prediction Process during the Preliminary Design Phase. Specifically add references to Reliability Trade Studies and FMECA. c. Tyrone and Steve – Tackle the Team 2 action item to begin developing a draft outline for the standard, which is titled, “Standard Format for Space System Reliability Computer Applications”. d. Tyrone – Construct a flow diagram for Similarity Analysis that shows how individual reliability assessment tasks might be integrated at the Reliability Program level. e. Tyrone – Develop draft definitions for some of the more typical electrical stress derating parameters. f. Tyrone – Write a draft guide and construct Reliability Analysis Process Flow Diagrams for the Detailed Design Phase. 2. Team 2 Action Items – a. All - Review the updated Fault Tree Schema. Specifically, check for correctness, completeness, and compliance with the stated purpose of standard (see page 1). b. SOHAR - Define interoperability (inputs and outputs to existing tools). c. SOHAR - Complete review for completeness of Gate Types. d. John - Review and update schema documentation. e. All - Review Team 1 documentation & findings. 7 Next Meeting The next SSRT Standards WG Meeting is scheduled for May 28, 2002, at 8:30 AM PDT. Team 1 and Team 2 will hold separate teleconferences from 8:30 AM to 10:45 AM PDT. At 10:45 AM PDT, Team 1 will join the Team 2 teleconference to discuss progress and actions. The following teleconference numbers are to be used: Team 1 teleconference number - (888) 550-5969, pass code 646354 Team 2 teleconference number - (888) 550-5969, pass code 162080 Arrangements have been made for Team 1 to use NetMeeting concurrently during the teleconference. For those that prefer face-to-face discussions, meeting rooms have been reserved at the following locations: Team 1 meeting room - The Aerospace Corporation, Building D-8, 200 N. Aviation Boulevard, El Segundo, CA 90245-4691 Team 2 meeting room - DSI International, 1574 N. Batavia, Suite 3, Orange, CA 92867 8 Planned Future Meetings Location: The Aerospace Corporation, Building D-8, 200 N. Aviation Boulevard, El Segundo, CA 90245-4691 Date: 2002 5/28 Teleconference 6/11 Teleconference 6/25 Teleconference 7/16 Teleconference 7/30 Teleconference 8/13 Teleconference 8/27 Teleconference 9/10 Teleconference 8/24 Teleconference Please direct all comments regarding these meeting minutes to: Tyrone Jackson SSRT Standards Working Group Coordinator Tyrone Jackson Reliability & Statistics Office The Aerospace Corporation Ph. (310) 336-6170 Fax (310) 336-5365 Email: Tyrone.Jackson@aero.org 9 Top-10 problems that affect the Reliability Programs of Space Systems as determined by an internal working group survey: 1. Valuable reliability lessons learned often are not in a format that is readily useable by the Reliability Program, or they have become “lessons forgotten” or “lessons ignored”. 2. Some reliability critical items often are not identified at all or are not properly controlled. 3. System reliability predictions often do not include probability of occurrence estimates for all relevant failure modes, failure mechanisms, and failure causes. (Probability of an induced fault during manufacture, or probability of damage during assembly often is not included in reliability predictions.) 4. The perceived accuracy of high-precision system reliability predictions often is not supported by the input data which is of lower precision that the result. 5. The steadily shrinking pool of “experienced” Reliability Engineering specialists is unable to meet the needs of a steadily growing number of space system development projects. 6. Many commercial reliability assessment tools have major shortcomings that may not be obvious to the casual reliability analyst (e.g., inaccurate equipment failure rate models, use of unverifiable parameters in equations, high misapplication rates, etc.). 7. Often, insufficient funding is provided to perform all of the tasks necessary for a HighReliability Program. (Some customers and managers believe that high-reliability can be tested-in more cost-effectively than it can be designed-in.) 8. Different approaches are being used across the space industry to perform reliability assessment tasks that are called by the same name, but which often serve different purposes. (Inconsistency in reliability assessment practices has become a major problem since DoD canceled military standards in the late 90’s.) 9. Some customers’ believe that all dependability predictions for space vehicle constellations are too conservative. (The basis of this belief is rooted in historical evidence that shows contingency procedures of ground operations are very effective for extending the useful life of a space vehicle far beyond it’s predicted mean-life. This phenomenon has resulted in many customers buying more space vehicles than necessary to meet the dependability requirements of the constellation.) 10. Sometimes the reliability analyst cannot take advantage of (or is unaware of) some of the critical data paths that link a particular task of the Reliability Program with: a. Other tasks within the Reliability Program; b. Systems Engineering Process functions outside the Reliability Program; or c. External product-related data sources. 10