Document 14578894

advertisement
SEMATECH and the SEMATECH logo are registered trademarks of SEMATECH, Inc.
WorkStream is a registered trademark of Consilium, Inc.
FAILURE REPORTING,
ANALYSIS, AND
CORRECTIVE ACTION SYSTEM
ii
CONTENTS
Page
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
EXECUTIVE SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
FAILURE REPORTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Collecting the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Reporting Equipment Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Reporting Software Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Logging Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Using FRACAS Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Failure Analysis Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Failure Review Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Root Cause Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Failed Parts Procurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
CORRECTIVE ACTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
FRACAS DATABASE MANAGEMENT SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Configuration Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Events Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Problems Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Report Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
GLOSSARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
FRACAS Functions and Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . inside back cover
iii
iv
ACKNOWLEDGMENTS
In putting together this document, there were many obstacles that kept getting in the way of
completing it. Without the support of my family, Sonia, Jason, and Christopher, this would
never have been accomplished.
Mario Villacourt
External Total Quality and Reliability
SEMATECH
I would like to thank my wife, Monique I. Govil, for her support and encouragement in
completing this work. Also, I want to thank SVGL for supporting my participation with this
effort.
Pradeep Govil
Reliability Engineering
Silicon Valley Group Lithography Systems
v
vi
EXECUTIVE SUMMARY
Failure Reporting, Analysis, and Corrective Action System (FRACAS) is a closed-loop feedback path in which the user and the
supplier work together to collect, record, and analyze failures of
both hardware and software data sets. The user captures predetermined types of data about all problems with a particular tool or
software and submits the data to that supplier. A Failure Review
Board (FRB) at the supplier site analyzes the failures, considering
such factors as time, money, and engineering personnel. The
resulting analysis identifies corrective actions that should be
implemented and verified to prevent failures from recurring. A
simplified version of this process is depicted in Figure 1.
2
Reliable
Equipment
Test &
Engineering
Change
Control
In-house Test
& Inspections
Design &
Production
Field
Operations
Corrective
Action
Failure
Reporting
Analysis
Figure 1 FRACAS Process
Page 1
Failure Reporting, Analysis, and Corrective Action System
Since factors vary among installation sites, equipment users must
work closely with each of their suppliers to ensure that proper data
is being collected, that the data is being provided to the correct
supplier, and that the resulting solutions are feasible.
Unlike other reliability activities, FRACAS promotes reliability
improvement throughout the life cycle of the equipment. The
method can be used during in-house (laboratory) tests, field (alpha
or beta site) tests, and production/operations to determine where
problems are concentrated within the design of the equipment.
According to the military,
Corrective action options and flexibility are greatest during
design evolution, when even major design changes can be
considered to eliminate or significantly reduce susceptibility
to known failure causes. These options and flexibility become
more limited and expensive to implement as a design
becomes firm. The earlier the failure cause is identified and
positive corrective action implemented, the sooner both the
producer and user realize the benefits of reduced failure
occurrences in the factory and in the field. Early implementation of corrective action also has the advantage of providing
visibility of the adequacy of the corrective action in the event
more effort is required. 1
In addition to its military use, FRACAS has been applied extensively in the aerospace, automotive, and telecommunications industries.
Elements of FRACAS can be found throughout all industries, but
this limited use usually means that the only data received by the
manufacturer are complaints logged with the field service or
customer service organizations.
FRACAS can provide control and management visibility for
improving the reliability and maintainability of semiconductor
manufacturing equipment hardware and software. Timely
disciplined use of failure and maintenance data can help generate
and implement effective corrective actions to prevent failures from
recurring and to simplify or reduce maintenance tasks.
1. Quoted from MIL-STD-2155(AS), Failure Reporting, Analysis and Corrective Action System.
Page 2
Failure Reporting, Analysis, and Corrective Action System
FRACAS objectives include the following:
Providing engineering data for corrective action
Assessing historical reliability performance, such as mean time
between failures (MTBF), mean time to repair (MTTR), availability, preventive maintenance, etc.
Developing patterns for deficiencies
Providing data for statistical analysis
Also, FRACAS information can help measure contractual performance to better determine warranty information.
FRACAS provides a complete reporting, analysis, and corrective
action process that fulfills ISO 9000 requirements. More and more
companies are requiring their suppliers to meet ISO 9000, for
example, the SEMATECH Standardized Supplier Quality Assessment (SSQA) and the Motorola Quality Systems Review (QSR).
In-house Test
& Inspections
FAILURE REPORTING
All events (failures) that occur during inspections and tests should
be reported through an established procedure that includes
Field
Operations
collecting and recording corrective maintenance information and
times. The data included in these reports should be verified and
then the data should be submitted on simple, easy-to-use forms that
Failure
Reporting
are tailored to the respective equipment or software.
Page 3
Failure Reporting, Analysis, and Corrective Action System
Collecting the Data
Many problems go unnoticed because insufficient information was
provided. The FRB must know if, for example, someone was able
to duplicate the problem being reported. There are three common
causes for missing essential data:
Inspection or testing began before a procedure was in place to
report problems.
The reporting form was difficult to use.
The person who filled out the form had not been trained.
Operators and maintenance personnel are usually the first to
identify problems and, therefore, they should be trained to properly
capture all of the information needed for an event report. The
contents of this report are more fully described in the next section.
However data is collected, one constant should be emphasized:
consolidate all the data into a central data logging system. When
field failures occur, two types of data collection are usually present:
(1) the user’s system that includes equipment utilization and
process-related information, and (2) field service reports (FSRs) that
include parts replacement information. The supplier has great
difficulty addressing problems when FSRs are the only source of
data. It is helpful if the user provides the supplier with detailed
logs for problem verification. The supplier should correlate both
types to help determine priorities for problems to be resolved.
The supplier should be included in the formation of the user’s data
collection system while the product is being installed. Ideally, the
equipment should log all information automatically and then
download the data to the supplier’s data collection system. This
would eliminate the need for paper forms and also the confusion
caused by duplicate data sets.
Page 4
Failure Reporting, Analysis, and Corrective Action System
Reporting Equipment Failures
Collecting and sharing appropriate data through event reports are
essential components of an effective FRACAS process, both for the
supplier and for the user. There are common elements in every
report (when the event occurred, what item failed, etc.) that the
user and the supplier both use to analyze failures. Other crucial
information includes the duration of the failure, the time it took to
repair, and the type of metric used (time or wafer cycles).
To simplify the effort of collecting data and to minimize any
duplication of effort, use the following guidelines:
Capture data at the time of the failure.
Capture utilization data through the user’s factory tracking
system (for example, WorkStream).
Avoid duplicating data collection between the supplier’s and
the user’s systems.
Make failure and utilization data available to both the supplier
and the user in a standardized format.
Exchange failure and utilization data between the supplier and
user frequently.
Link event reports to the corresponding utilization data in the
factory tracking system through an identifying event number
that is captured on the equipment.
Automate both the user’s and the supplier’s systems to
maximize efficiency, minimize paper tracking, and avoid dual
reporting.
In addition, make full use of reporting tools that already exist (to
which little or no modification may be necessary), such as
In-house test reports
Field service reports
Page 5
Failure Reporting, Analysis, and Corrective Action System
Customer activity logs
Customer equipment history
Any other pertinent information you may need may be determined
by asking the design engineers what information they need for root
cause analysis. Data log files (history kept by the software) should
accompany the failure report to provide software engineers with
information about events that preceded the malfunction.
As a result of the data collected, minimum reliability performance
metrics (such as MTBF, MTTR, and uptime percentage) could be
determined for each system.
Every piece of data helps. In addition to these common elements,
there are specific types of data needed only by the user or only by
the supplier.
The supplier should acquire detailed failure information from the
event report that should become an integral part of the equipment
log file. The details help determine the state of operation before and
after the failure, which is necessary to address the root cause of
equipment failures efficiently and effectively.
Table 1 shows the types of detail that the supplier needs to see in
the event report (in-house or field). Figure 2 is an example of an
event report form. The format of the form is important only to
simplify the task of the data recorder. You may want to computerize your data entry forms to expedite the process and also minimize
failure description errors.
Page 6
Failure Reporting, Analysis, and Corrective Action System
TABLE 1 EVENT REPORT FIELDS
Event Report
Fields
Example
Description
Serial Number
001
System-unique identifier that
indicates who the customer is
and where the system is located
Date and Time
of Event
March 12, 1994 at
4:00 PM
Duration of
the Event
(Downtime)
01:21
Usually in hours and minutes
Failed Item
Description
Handler-Stocker-RobotMotor
Describes function and location
of failed item from top to bottom;
that is, field to lowest replaceable
unit
Reliability/
Fault Code
of Known
Problems
AH-STK-ROT-MOT-021
Known problem categorized by
reliability code and failure mode
number
Life Cycle
Phase of the
System
Field Operations
See equipment life cycle in the
Glossary
Downtime
Category
(Maintenance
Action)
unscheduled
Usually shown as either
unscheduled or scheduled
(preventive maintenance)
Part Number
244-PC
Part number of replaced item (if
any)
Operator’s
Name
M. Lolita
Person reporting problem
Service
Engineer’s
Name
S. Spade
Person performing repair action
Event Report
Identification
Number
FSR-002
Supplier’s report number
Event
(Problem)
Description
What happened?
Description of all conditions
prior to failure and how observer
thinks it failed
Repair Action
Description
What did you do to
repair item?
Description of any repair or
maintenance attempts
Page 7
Failure Reporting, Analysis, and Corrective Action System
EVENT RECORD
Reliability Code: DMD-XS-HRN-M3S-002
Serial Number: 9003
Date: 09/01/93
Down Time Category: Unscheduled
Time: 09:00:
Life Cycle Phase: OPERATION
Duration: 04:30:
Relevant: Y
PM: N
System: DMD
Part Number: 650-000029
Subsystem: XS
Operator: FRANK BELL
Assembly: HRN
FSE Name:
Subassembly: M3S
FSE Number:
Sub-subassembly:
Problem:
PREVIOUS ATTEMPT TO RESOLVE TRANSPORT LOAD FAILURE DID
NOT WORK. REQUIRES SWITCH REPLACEMENT.
Repair
Action:
REPLACED SWITCH, CURED LENS 2, ALIGNED COLUMN, CHECKED
JET ALIGN, MADE TEST MILLS AND DEPOS.
Figure 2 Example Event Report
The user analyzes this data to evaluate management of equipment
and resources and to determine its effectiveness in meeting internal
commitments. Internally, user data may be handled differently
than described in SEMI E10-92, but as a minimum for sharing data
with the supplier, the major states in Figure 3 should be followed.
NON-SCHEDULED
equipment
downtime
UNSCHEDULED
DOWNTIME
ENGINEERING
equipment
uptime
total time
(168 hrs/wk)
SCHEDULED
DOWNTIME
operations
time
STANDBY
PRODUCTIVE
Figure 3 Equipment States Stack Chart
Page 8
Failure Reporting, Analysis, and Corrective Action System
Reporting Software Problems
When reporting software problems, too much detail can be
counterproductive. An error in software does not necessarily mean
that there will be a problem at the system (equipment) level. To
quote Sam Keene, 1992 IEEE Reliability Society president,
If I’m driving on a two-lane road (same direction) and
switch lanes without using the signal indicator and
continue to drive, I’ve committed an error. But if I do the
same when another car is on the other side and I happen
to crash into it, then I now have a problem.
Your existing software error reporting process should become an
integral part of your overall FRACAS. (Rather than reiterating the
software reliability data collection process, a practical approach can
be found in A.M. Neufelder’s work.2)
FRACAS helps you focus on those errors that the customer may
experience as problems. One focusing mechanism is to track the
reason for a corrective action. This data can be collected by reviewing the repair action notes for each problem in the problem report.
Engineering and management can categorize these reasons to
determine the types of errors that occur most often and address
them by improving procedures that most directly cause a particular
type of error. The process of analyzing this data is continuous.
Among the reasons for corrective action are the following:
Unclear requirements
Misinterpreted requirements
Changing requirements
Documented design not coded
Bad nesting
Missing code
2. Refer to Chapter 7, Software Reliability Data Collection, of Ensuring Software Reliability, by
A. M. Neufelder (Marcel Dekker, Inc., New York, 1993).
Page 9
Failure Reporting, Analysis, and Corrective Action System
Excess code
Previous maintenance
Bad error handling
Misuse of variables
Conflicting parameters
Software or hardware environment change
Specification error
Problem with third-party software
Documented requirements not designed
Software environment error (for example, an error in the
compiler)
Reasons should also be ranked in terms of the criticality or severity
of the error. This information helps management predict those
errors that will cause downtime (as opposed to all errors, including
those that may not cause downtime).
Severity rankings are:
1. Catastrophic
2. Severe
3. Moderate (has work-around)
4. Negligible
5. All others (for example, caused by a misunderstood new feature
or unread documentation)
Also, tracking the modules or procedures modified for each
corrective action helps schedule pre-release regression testing on
these changes, which results in more efficient test procedures and
more effective test results.
Page 10
Failure Reporting, Analysis, and Corrective Action System
Logging Data
Whether software or equipment errors are being reported, once the
supplier receives the reports, all data should be consolidated into
one file. The supplier’s reliability/quality organization should
oversee the data logging.
A FRACAS database is recommended for this part of the process.
This is not a product-specific database but rather a database for
tracking all products offered by a company. Figure 4 shows how
the status of problems is kept updated in the database.
PROBLEMS RECORD
Title/Description
LIMIT FLA SENSORS FAIL
Reliability Code
ATF-STK-PRI-RT-XAX-001
Initial Status: I
System: ATF
Fault Category: Facilities Elec
Subsystem: STK
Error Codes: errcode 2
Assembly: PRI
Assignee: Eric Christie
Subassembly: RT
Sub-subassembly: XAX
DATES
ROOT CAUSE SOLUTION (RCS)
Insufficient Info: 11/04/93
Sufficient Info:
/
/
Defined:
/
/
Contained:
/
/
Retired:
/
/
Original Plan:
/
/
Current Plan:
/
/
Complete:
/
/
RCS Documentation:
Comments:Not enough information at this time. However, supplier is
investigating vendor part #po09347-90.
STATISTICS
Current Status:
I
First Reported: 11/04/93
Number of Fails:
2
Last Reported: 11/11/93
Number of Assists:
0
Number of Others:
0
Total Events:
0
Total Downtime Hours: 2.72
Figure 4 Problems Record
Page 11
Failure Reporting, Analysis, and Corrective Action System
Using FRACAS Reports
The FRACAS database management system (DBMS) can display
data in different ways. This section describes some of the report
types you may find useful. A complete failure summary and other
associated reports that this DBMS can provide are described in
more detail beginning on page 27.
The graphical reports provide a quick snapshot of how the equipment or software is performing at any given point. Figure 5 shows
the percentage rejection rate on a monthly basis as actual-versustarget rates. Figure 6 depicts the number of failure modes or
problems on a weekly basis. Figure 7 points out the number of
events being reported weekly by life cycle phase. Figure 8 provides
a one-page status outline for tracking by the FRB. Whichever
report you choose, it should be tailored to provide summaries and
special reports for both management and engineering personnel.
Factory Test Tracking
Initial Test Reject Rate
D
+
Monthly Rate
3 Month Rate
Reject %
Target
Rate (8%)
Volume
Feb Mar
566 97
Apr May
679 598
Jun
22
Jul
123
Aug
126
Sep
333
Oct
98
Nov
98
Dec Jan ’86
145
88
Production – HP3060 Test
Figure 5 Example—Percentage Rejection Rate, Actual Versus Target
Page 12
Failure Reporting, Analysis, and Corrective Action System
By Reliability Code
From 08/30/93 to 10/03/93
8
Events
6
4
2
0
1
2
3
4
5
Week
DMD-OP-IB–IGA-SRC-002
DMD-OP-IB–IGA-SRC-003
DMD-VS-HC-001
Figure 6 Example—Number of Events Weekly
Events by Life Cycle Phase by Week
From 09/27/93 to 10/17/93
4
3
2
1
0
1
2
Phase II
3
Phase III
Figure 7 Example—Number of Events Weekly by Life Cycle Phase
Page 13
Failure Reporting, Analysis, and Corrective Action System
FAILURE/REPAIR REPORT
V H/W
V S/W
1. PROJECT
V SE
V TE
2. FAILURE DATE
6. SUBSYSTEM
NO.
V OTHER
3. FAILURE GMT
DAY ___ HR ___ MIN ___
7. 1ST TIER
H/W, S/W
4. LOG NO.
V OTHER
5. REPORTING LOCATION
V SUPPLIER
8. 2ND TIER
H/W, S/W
9. 3RD TIER
H/W, S/W
10. 4TH TIER
H/W, S/W
11. DETECTION AREA
I ORIGINATOR
A REFERENCE
DESIGNATIONS
B NOMENCLATURE
V BENCH
V ACCEPT
V FAB/ASSY
V QUAL
V INTEG
V SYSTEM
V FIELD
V OTHER
C SERIAL NUMBER
D OPERATING TIME
CYCLES
12. FAILURE DESCRIPTION
V ACOUSTIC
V EMC/EMI
13. ENVIRONMENT
V AMB
V TEMP
V VIE
V SHOCK
V OTHER
V HUMID
14. ORIGINATOR
15. DATE
16. COGNIZANT ENGINEER
II VERIFICATION
17. VERIFICATION AND ANALYSIS
18. ASSOCIATED
DOCUMENTATION
18. ITEM
DATA
A. ITEM NAME
70. FAILURE CAUSE
V DESIGN
B. NUMBER
V WORKMANSHIP
V DAMAGE
(MISHANDLING)
V MFG PROCESS
C. SERIAL NO.
D. CIRCUIT DESIGNATION
E. MANUFACTURER
V PART FAILURE
V ADJUSTMENT
V S/W
V LRU FAILURE
V OPERATING TIME
V TEST ERROR
V EQMT FAILURE
21. SIGNATURE
G. DEFECT
V SE FAILURE
V DOCUMENTATION
22. DATE
V OTHER
23. ACCUMULATED
MAINTENANCE
A. ON LOCATION
B. DEPOT SHOP
C. MANUFACTURING SHOP
III REPAIR
a NAME
b DATE
c
HRS/MINS
d LOCATION
e WORK
PERFORMED
IV CORRECTIVE ACTION
24. CORRECTIVE ACTION TAKEN
V NO ACTION
25. DISPOSITION OF SUBSYSTEM OR ASSEMBLY
V REDESIGNED
V REWORKED
V READJUSTED
V SCRAPPED
V THIS UNIT
V RETESTED
V OTHER
V FAR NO.
26. EFFECTIVITY
V ALL UNITS
V OTHER
27. SAFETY
V RFC NO.
28. SIGNATURE COG ENGR
29. SEC
30. DATE
31. SIGNATURE COG SEC ENGR
34. SYSTEM ENGR
35. DATE
36. PROJECT RELIABILITY ASSURANCE
32. DATE
33. SUBSYSTEM
RATING
37. DATE
38. RISK ASSESSMENT
COPY DISTRIBUTION WHITE – PFA CENTER, CANARY – PROJECT OFFICE, PINK – ACTION COPY
GOLDENROD – ACCOMPANY FAILED ITEM
Figure 8 Example—Failure/Repair Report
Page 14
Failure Reporting, Analysis, and Corrective Action System
Failure
Reporting
ANALYSIS
Failure analysis is the process that determines the root cause of the
failure. Each failure should be verified and then analyzed to the
Analysis
extent that you can identify the cause of the failure and any
contributing factors. The methods used can range from a simple
investigation of the circumstances surrounding the failure to a
sophisticated laboratory analysis of all failed parts.
Failure Analysis Process
Failure analysis begins once an event report is written and sent to
the FRB. It ends when you sufficiently understand the root cause so
you can develop logically derived corrective actions.
It is important to clearly communicate the intent and structure of
the failure analysis process to all appropriate organizations. These
organizations should review and approve the process to confirm
their cooperation. During and after the analysis, the problem
owner, associated FRB member, and reliability coordinator must
ensure that the database is maintained with the current applicable
information.
The analysis process is:
1. Review, in detail, the field service reports.
2. Capture historical data from the database of any related or
similar failures.
3. Assign owners for action items.
4. Do a root cause analysis (RCA).
5. Develop corrective actions.
6. Obtain the failed items for RCA (as needed).
7. Write a problem analysis report (PAR) and, if needed, a material
disposition report (MDR).
Page 15
Failure Reporting, Analysis, and Corrective Action System
Developing proper forms for tracking the failed parts and reporting
the problem analysis results is essential. Figures 9 and 10 are
examples of these types of forms.
MATERIAL DISPOSITION REPORT
S.NO.
FAILURE DATE
REWORK ORDER NO
J37511
z RETURNED PART DESCRIPTION
PART NUMBER
QTY RETURNED
PART DESCRIPTION
FSR NUMBER
z REASON FOR RETURN
V DEFECTIVE
V ENG CHANGE
V OVERISSUE
V OVERDRAWN
V OTHER
z USED ON
EQUIPMENT S.NO.
ASSEMBLY NO.
SITE NUMBER
DATE
z MATERIAL DISPOSITION INSTRUCTIONS
V RETURN TO STOCK
V SEND TO FRB FOR FAILURE ANALYSIS
V SEND FOR VENDOR FAILURE ANALYSIS
V SCRAP
V RETURN TO VENDOR
V SEND FOR REPAIR
z RELIABILITY INFORMATION
TASK NUMBER
FRB MEMBER
ORIGINATOR
DATE
DATE
DEFECT DESCRIPTION (FILLED BY THE ORIGINATOR)
z DISTRIBUTION
V
V
V
V
Figure 9 MDR Forms
Page 16
Failure Reporting, Analysis, and Corrective Action System
RELIABILITY
MDC
FS
QUALITY
PROBLEM ANALYSIS REPORT
PROBLEM NUMBER: ___ ___ — ___ ___ — ___ ___ ___
ERROR CODE:
PROBLEM TITLE:
FRB MEMBER:
ASSIGNEE:
TYPE OF PROBLEM: F/A j
COMMON:
j
O j
UNIQUE: j
CURRENT FIX DATE/SW REV.
CATEGORY: WORK: j PROC: j DESIGN: j
MODULE CODE: ____ ____
INSUFFICIENT INFO: j DATE:
SUFFICIENT INFO: j DATE:
COMMENTS:
j DEFINITION OF PROBLEM:
DEFINED: j DATE
j CONTAINMENT:
CONTAINED: j DATE
j ROOT CAUSE:
j CORRECTIVE ACTION:
RETIRED/CLOSED: j DATE:
VERIFICATION:
DOCUMENTATION:
j ECN:
j FRI
REVISION:
A: j B: j
j OTHER:
C: j
D: j
DATE:
ENTERED IN DB:
*** IF MORE SPACE IS NEEDED, PLEASE ATTACH SEPARATE PIECE OF PAPER ***
Figure 10 PAR Form
Page 17
Failure Reporting, Analysis, and Corrective Action System
Failure Review Board
The FRB reviews failure trends, facilitates and manages the failure
analysis, and participates in developing and implementing the
resulting corrective actions. To do these jobs properly, the FRB
must be empowered with the authority to require investigations,
analyses, and corrective actions by other organizations. The FRB
has much in common with the techniques of quality circles; they are
self-managed teams directed to improve methods under their
control (see Figure 11).
FRB meets
regularly
Review in-house
& field reports
FRB establishes
preliminary
assignments
Reliability
coordinator
brings reports
Yes
Does FRB
require
additional
data?
No
FRB confirms
problem
assignments within
24 hours
A
Figure 11 FRB Process
Page 18
Failure Reporting, Analysis, and Corrective Action System
A
Yes
Sufficient
information?
Enter into
DBMS & write
PARs
No
FRB assigns
problem owner
After suitable period
if additional events
have not recurred,
retire problem
Design
problems
Operator/
field
services
Update DBMS
Distribute
updates to
FRB
Mfg.
issues
Software
problems
Others
Establish
containment, RCA
plan/schedule
FRB reviews &
ensures reliable
solution
RCA complete
Begin ECN Process
Figure 11, continued FRB Process
Page 19
Failure Reporting, Analysis, and Corrective Action System
The makeup of the FRB and the scope of authority for each member
should be identified in the FRACAS procedures. The FRB is
typically most effective when it is staffed corporate-wide, with all
functional and operational disciplines within the supplier organization participating. The user may be represented also. Members
should be chosen by function or activity to let the composition
remain dynamic enough to accommodate personnel changes within
specific activities.
The FRB composition may be product specific, if desired. For
example, a supplier’s different products may be complex and
therefore require expert knowledge. In this scenario, though,
overall priorities must be managed carefully, since the FRB may
require the same expert resources for root cause analysis.
Generally, the following organizations are represented on the FRB:
Reliability
Quality Assurance
Field Service
Manufacturing
Statistical Methods
Marketing
Systems Engineering
Test
Design Engineering*
The FRB should be headed by the reliability manager. One of the
main functions of this reliability manager is to establish an effective
FRACAS organization. The reliability manager must establish
procedures, facilitate periodic reviews, and coach the FRB members. Other responsibilities include:
Assign action items with ownership of problem/solution
(who/what/when)
Allocate problems to appropriate functional department for
corrective action
* Hardware, software, process, and/or materials design, depending on the type of system
being analyzed.
Page 20
Failure Reporting, Analysis, and Corrective Action System
Conduct trend analysis and inform management on the types
and frequency of observed failures
Keep track of failure modes and criticality
Issue periodic reports regarding product performance (for
example, MTBF, MTTR, productivity analysis) and areas
needing improvement
Ensure problem resolution
Continually review the FRB process and customize it to fit
specific product applications
The FRB operates most efficiently with at least two recommended
support persons. A reliability coordinator can procure failed items
for root cause analysis, ensure that the event reports submitted to
the FRB contain adequate information, stay in contact with internal
and external service organizations to ensure clarity on service
reports, and track and facilitate RCA for corrective action implementation.
A database support person should be responsible for providing
periodic reports and for maintaining the database (importing/exporting data, backing up files, archiving old records).
This will
ensure that data available from RCA and corrective action is kept
current in the database.
Responsibilities of the FRB members include the following:
Prepare a plan of action with schedule
Review and analyze failure reports from in-house tests and field
service reports
Identify failure modes to the root cause
Page 21
Failure Reporting, Analysis, and Corrective Action System
Furthermore, each FRB member should
Have the authority to speak for and accept corrective action
responsibility for an organization
Have thorough knowledge of front-end systems
Have the authority to ensure proper problem resolution
Have the authority to implement RCA
Actively participate
The FRB should meet on a regular basis to review the reported
problems. The frequency of these meetings will depend on the
number of problems being addressed or on the volume of daily
event reports. Problem-solving skills and effective meeting
techniques can ensure that meetings are productive. If you need
assistance in these areas, SEMATECH offers short courses3 that are
affordable and can be customized for your application.
The top problems, based on pareto analyses (statistical methods)
derived from the database, should be used to assign priorities and
to allocate resources.
The reporting relationship of the FRB with other FRACAS functions
is shown in Figure 12.
3. Contact the SEMATECH Organization Learning and Performance Technology Department, 2706 Montopolis Drive, Austin, TX 78741, (512) 356–7500
Page 22
Failure Reporting, Analysis, and Corrective Action System
Reliability
Manager
Failure
Review
Board
Reliability
Coordinator
Problem
Owner
Database
Support
Problem
Owner
In-house
Test
Inspections
Problem
Owner
Field
Operations
Figure 12
FRACAS Functions Responsibilities
Root Cause Analysis
In failure analysis, reported failures are evaluated and analyzed to
determine the root cause. In RCA, the causes themselves are
analyzed and the results and conclusions are documented. Any
investigative or analytical method can be used, including the
following:
Brainstorming
Histogram
Flow chart
Force field analysis
Page 23
Failure Reporting, Analysis, and Corrective Action System
Pareto analysis
Nominal group technique
FMEA4
Trend analysis
Fault tree analysis
Cause and effect diagram (fishbone)
These tools are directly associated with the analysis and problemsolving process.5 Advanced methods, such as statistical design of
experiments, can also be used to assist the FRB. The Canada and
Webster divisions of Xerox Corporation extensively use design of
experiments as their roadmap for problem solving;6 they find this
method helps their FRBs make unbiased decisions.
Any resulting corrective action should be monitored to ensure that
causes are eliminated without introducing new problems.
Failed Parts Procurement
A failed parts procurement process should be established to assist
the FRB in failure analysis. Failed parts may originate from field,
factory, or company stock, or they may become defective during
shipment. Each of these scenarios requires different action. The
procedure should clearly define the process, including the timeline
and responsibilities for at least the following tasks:
Shipment of failed parts from field to factory
Procurement of failed parts from various factory locations
4. Failure Mode and Effects Analysis. More information is available through the Failure
Mode and Effects Analysis (FMEA): A Guide for Continuous Improvement for the Semiconductor Industry, which is available from SEMATECH as technology transfer #92020963A-ENG.
5. Analysis and problems solving tools are described in the Partnering for Total Quality: A Total Quality Toolkit, Vol. 6, which is available from SEMATECH as technology transfer
#90060279A-GEN.
6. A fuller description of the Xerox process can be found in Proceedings of Workshop on Accelerated Testing of Semiconductor Manufacturing Equipment, which is available from SEMATECH as
technology transfer #91050549A–WS.
Page 24
Failure Reporting, Analysis, and Corrective Action System
Corrective
Action
Failed part reports from each location
Communications between material disposition control and
reliability engineering
Sub-tier supplier failure analysis
Procurement of sub-tier supplier PAR and implementation of
corrective action
Disposition of failed part after RCA is completed
CORRECTIVE ACTION
When the cause of a failure has been determined, a corrective action
plan should be developed, documented, and implemented to
Design &
Production
Test &
Engineering
Change
Control
eliminate or reduce recurrences of the failure. The implementation
plan should be approved by a user representative, and appropriate
change control procedures should be followed. The quality
assurance organization should create and perform incoming test
procedures and inspection of all redesigned hardware and software. SEMATECH has developed an Equipment Change Control
System7 that you may find useful (see Figure 13).
To minimize the possibility of an unmanageable backlog of open
failures, all open reports, analyses, and corrective action suspension
dates should be reviewed to ensure closure. A failure report is
closed out when the corrective action is implemented and verified
or when the rationale is documented (and approved by the FRB) for
any instances that are being closed without corrective action.
7. Refer to Equipment Change Control: A Guide for Customer Satisfaction, which is available
from SEMATECH as technology transfer #93011448A–GEN.
Page 25
Failure Reporting, Analysis, and Corrective Action System
An effective change control system incorporates the following characteristics.
❑
An established equipment baseline exists with current and accurate
V
V
V
❑
❑
All changes are identified by using
V The equipment baseline
V Calibration systems
V Raw materials supplier change control V Traceability systems
V Procurement methods and procedures V Materials review process
V Incoming quality control
All changes are recorded, forecasted, and tracked in an equipment
change database incorporating
V
V
V
V
V
V
❑
Name of change
Change description
Reason for change
All equipment affected
Retrofit required/optional
Concerns/risks
V
V
V
V
V
Qualification data
Implementation plan
Start date
Completion date
Status of change
Proper investigation of all changes is performed by a Change Evaluation
Committee made of experts from
V
V
V
V
V
❑
Design specs and drawings
Manufacturing process instructions
Equipment purchase specifications
Equipment manufacturing
Equipment design
Software design
Quality and reliability
Service and maintenance
V
V
V
V
Environment, health, and safety
Procurement
Process engineering
Marketing
A Change Evaluation Committee is chartered to
V
V
V
V
V
V
Validate change benefits using statistical rigor as required
Specify change qualification and implementation plans
Manage and drive change implementation schedules
Update equipment specifications and drawings
Maintain the equipment change database
Communicate with the customer
In summary, a change control system is functioning well when it systematically provides
❑
Identification of all changes relative to an equipment baseline
❑
Recording, forecasting, and tracking of all changes
❑
Proper investigation of all changes
❑
Communication to the customer
Figure 13 Equipment Change Control Characteristics
Page 26
Failure Reporting, Analysis, and Corrective Action System
FRACAS DATABASE MANAGEMENT SYSTEM
The FRACAS Database Management System (DBMS) facilitates
failure reporting to establish a historical database of events, causes,
failure analyses, and corrective actions. This pool of knowledge
should minimize the recurrence of particular failure causes. This
DBMS is not product specific. It is available from SEMATECH8, or
you can implement the same concept in whatever software you
choose. However, the database should have the same software
requirements and characteristics at both the supplier’s and user’s
sites.
This section describes characteristics of the SEMATECH product. If
you are creating your own implementation, you should develop the
following aspects. Sections follow that describe each table more
fully.
Configuration Table
Events Table
Problems Table
Reports
8. Available from SEMATECH Total Quality division. Software transfer and accompanying
documentation in press.
Page 27
Failure Reporting, Analysis, and Corrective Action System
Configuration Table
The configuration table accepts information about each machine or
software instance. There are six standard fields, as shown in the
following table. In the SEMATECH FRACAS database, there are
seven optional fields that can be customized for your purposes,
such as customer identification number, software revision, process
type, engineering change number, site location, etc.
TABLE 2 STANDARD FIELDS OF THE CONFIGURATION TABLE
Field Name
Description
Serial Number
User input.
Site
Enter the location of the machine.
System
User input.
Phase
User input. Enter the current Life Cycle Phase of
the machine. Required field.
Phase Date
User input. Enter the date the machine went into
the current Life Cycle Phase.
Weekly Operational
Time
User input. Enter the number of hours the
machine is scheduled to be in operation.
Page 28
Failure Reporting, Analysis, and Corrective Action System
Events Table
The primary purpose of the FRACAS application is to record
downtime events for a particular system. The fields of the Events
Table are described in Table 3. To understand these fields fully, you
need to understand the relationship between the problem fields
(System, Subsystem, Assembly, Subassembly, and/or Sub-subassembly) and the reliability code.
The reliability code field contains one or more of the problem field
names. Data (if any) from the problem fields automatically helps
you select or create the reliability code. Once you select a code,
associated fields are updated automatically.
TABLE 3 EVENTS TABLE FIELD DESCRIPTIONS
Field Name
Description
Serial Number
User input. The serial number of machine having
event.
Date
User input. The date the event occurred. Enter in a
MM/DD/YY format.
Time
User input. The starting time of the event. Entered
in 24-hour time (HH:MM:SS). Seconds are not
required.
Duration
User input. The duration of the event. Entered in a
HH:MM:SS format. Seconds are not required.
System
Automatic input based upon the serial number.
Subsystem
User input. The optional subsystem code (if the
part is at a subsystem level) involved in the event.
Related to the System codes.
Assembly
User input. The optional assembly code (if the part
is at an assembly level) involved in the event.
Related to the selected System and Subsystem
codes.
Operator
User input. Name of the operator when the event
occurred.
continued on next page
Page 29
Failure Reporting, Analysis, and Corrective Action System
TABLE 3 EVENTS TABLE FIELD DESCRIPTIONS, CONTINUED
Field Name
Description
FSE Name
User input. Name of field service engineer who
responded to the event.
FSR#
User input. The field service report number.
Subassembly
User input. The optional subassembly code (if the
part is at a subassembly level) involved in the
event. Related to the selected System, Subsystem,
and Assembly codes.
Sub-subassembly
User input. The optional sub-subassembly code (if
the part is at a sub-subassembly level) involved in
the event. Related to the selected System,
Subsystem, Assembly, and Subassembly codes.
Reliability Code
User input. The problem reliability code. This is
validated against current reliability codes. A new
one may be added by selecting “New” from the
pop-up list box.
Life Cycle Phase
The current phase of the system. Defaults to
current phase from Configuration file.
Down Time Category
User input. Downtime code. Either scheduled or
unscheduled. Scheduled downtime should be used
for holidays, weekends, or any other time when the
machine is not scheduled to be in operation.
Part Number
User input. The machine part number.
Relevant Failure
User input. Whether the failure was relevant or
not. Y for yes; N for no.
PM
User input. Whether the downtime was due to
preventive maintenance. Y for yes; N for no.
Problem
User input. The description of the problem. This
can be any number of characters.
Repair Action
User input. The description of the repair action.
This can be any number of characters.
Page 30
Failure Reporting, Analysis, and Corrective Action System
Problems Table
In the SEMATECH implementation, the Problem Record allows you
to view, modify, or add records for problems. The fields of the
Problems Table are described in Table 4.
TABLE 4 PROBLEMS FIELD DESCRIPTIONS
Field Name
Description
System
User input from Events screen. If adding from
Events screen, automatic input based upon the
serial number.
Subsystem
User input. Optional subsystem code (if part is
subsystem level) involved in event. Related to
System code. See also Reliability Code.
Assembly
User input. Optional assembly code (if part is at
assembly level) involved in event. Related to
selected System and Subsystem codes. See also
Reliability Code.
Subassembly
User input. Optional subassembly code (if part is at
subassembly level) involved in event. Related to
selected System, Subsystem, and Assembly codes.
See also Reliability Code.
Sub-subassembly
User input. Optional sub-subassembly code (if part
is at sub-subassembly level) involved in event.
Related to selected System, Subsystem, Assembly,
and Subassembly codes. See also Reliability Code.
Reliability Code
Automatically generated from the above fields. A
sequence number will be added to the code.
Title/Description
User input. Simple title or description of event.
Initial Status
User input. Required. Status of problem when first
reported. If problem is being added from Event
screen, date of event is entered automatically as
initial status date.
Fault Category
User input. Event category.
Error Codes
User input. Error code for selected tool.
Assignee
User input. Staff member who assigned this problem.
Insufficient Information
Date when insufficient information status was
declared.
continued on next page
Page 31
Failure Reporting, Analysis, and Corrective Action System
TABLE 4 PROBLEMS FIELD DESCRIPTIONS, CONTINUED
Field Name
Description
Sufficient Information
Date when sufficient information status was
declared.
Defined
Date when defined information status was
declared.
Contained
Date when contained information status was
declared.
Retired
Date when retired information status was declared.
Root Cause
Solution – Original
Original (planned) date for fixing problem.
Root Cause
Solution – Current
Current date for fixing problem.
Complete
Completion date for fixing problem.
RCS Documentation
User input. Points to documentation of fix; for example, Engineering Change Numbers (ECNs).
Comments
User input. Memo area for problem comments.
Can be any number of characters.
Report Types
Your reporting mechanism for this database should be flexible
enough to accommodate various customer types and their needs.
For example, you need to be able to pull reports for an FRB analysis
as well as for the users, sub-tier suppliers, and those who are
addressing corrective actions. At a minimum, the following reports
should be available:
Trend charts to show a number of events during a time period.
Event history to show a list of events for during a time period.
Problem history to show a list of problems during a time period.
Reliability statistics to demonstrate the system performance in
terms of MTTR, MTBF, mean time to failure (MTTF), availability, etc.
Page 32
Failure Reporting, Analysis, and Corrective Action System
GLOSSARY
Corrective Action
A documented change to design, process,
procedure, or material that is implemented and
proven to correct the root cause of a failure or design
deficiency. Simple part replacement with a like item
is not considered corrective action.
Equipment Life Cycle
The sequence of phases or events
constituting total product existence. The equipment
life cycle is divided into the following six phases:
Error
Concept and feasibility
Design
Prototype (alpha-site)
Pilot production (beta-site)
Production and operation
Phase-out
Human action that results in a fault (term is usually
reserved for software).
Error Code
Numbers and/or letters reported by the equipment’s
software that represent an error type; code helps
determine where the fault may have originated.
Failure
An event in which an item does not perform its
required function within the specified limits under
specified conditions.
Failure Analysis A determination of failure cause made by use of
logical reasoning from examination of data, symptoms, available physical evidence, and laboratory
results.
Page 33
Failure Reporting, Analysis, and Corrective Action System
Failure Cause The circumstance that induces or activates a failure.
Examples of a failure cause are defective soldering,
design weakness, assembly techniques, and software
error.
Failure Mode The consequence of the failure mechanism (the
physical, chemical, electrical, thermal, or process
event that results in failure) through which the
failure occurs; for example, short, open, fracture,
excessive wear.
Failure Review Board (FRB) A group consisting of representatives
from appropriate organizations with the level of
responsibility and authority to assure that failure
causes are identified and corrective actions are
effected.
Fault
Immediate cause of failure. A manifestation of an
error in software (bug), if encountered, may cause a
failure.
Fault Code
Type of failure mechanism categorized to assist
engineering in determining where the fault may
have originated. For example, a fault may be
categorized as electrical, software, mechanical,
facilities, human, etc.
FMEA
Failure Modes and Effects Analysis. An analytically
derived identification of the conceivable equipment
failure modes and the potential adverse effects of
those modes on the system and mission.
FRACAS
Failure Reporting, Analysis, and Corrective Action
System. A closed-loop feedback path by which
failures of both hardware and software data are
collected, recorded, analyzed, and corrected.
Page 34
Failure Reporting, Analysis, and Corrective Action System
Laboratory Analysis The determination of a failure mechanism
using destructive and nondestructive laboratory
techniques such as X-ray, dissection, spectrographic
analysis, or microphotography.
Mean Time to Repair (MTTR)
Measure of maintainability. The
sum of corrective maintenance times at any specific
level of repair, divided by the total number of
failures within an item repaired at that level, during
a particular interval under stated conditions.
Mean Time Between Failures (MTBF) Measure of reliability for
repairable items. The mean number of life units
during which all parts of the item perform within
their specified limits, during a particular measurement interval under stated conditions.
Mean Time to Failure (MTTF)
Measure of reliability for nonrep-
airable items. The total number of life units of an
item divided by the total number of failures within
that population, during a particular measurement
interval under stated conditions.
Related Event
A failure that did not cause downtime. A related
event or failure is discovered while investigating a
failure that caused downtime. A related event could
also be a repair or replacement during downtime
caused by the main event.
Relevant Failure Equipment failure that can be expected to occur
in field service.
Reliability
The duration or probability of failure-free performance under stated conditions.
Reliability Code A functional traceable description of the
relationships that exists in the equipment. Codes are
Page 35
Failure Reporting, Analysis, and Corrective Action System
derived in a tree fashion from top to bottom as
System, Subsystem, Assembly, Subassembly, and
Sub-subassembly (typically lowest replaceable
component). Codes vary in size; however, a
maximum three alphanumeric descriptor per level is
recommended. For example, AH-STK-ROT-XAXMOT for Automated Handler-Stocker-Robot-X_AxisMotor.
Repair or Corrective Maintenance All actions performed as a
result of failure to restore an item to a specified
condition. These can include localization, isolation,
disassembly, interchange, reassembly, alignment, and
checkout.
Root Cause
The reason for the primary and most fundamental
failures, faults, or errors that have induced other
failures and for which effective permanent corrective
action can be implemented.
Uptime
Time when equipment is in a condition to perform
its intended function. It does not include any
portion of scheduled downtime and nonscheduled
time.
Glossary References:
MIL-STD-721C Definitions of Terms for Reliability and Maintainability
(1981).
Omdahl, T. P. Reliability, Availability, and Maintainability (RAM)
Dictionary, ASQC Quality Press (1988).
SEMI E10-92. Guideline for Definition and Measurement of Equipment
Reliability, Availability, and Maintainability (RAM), SEMI (1993).
Page 36
Failure Reporting, Analysis, and Corrective Action System
FRACAS Functions and Responsibilities
Responsibilities
Function
Failure
Operators
Identify problems. Call for maintenance.
Annotate incident.
Maintenance
Correct problem. Log failure.
Failure Report
Maintenance
Generate report with supporting data (time,
place, equipment, etc.).
Data Logging
Reliability
Log all failure reports. Validate failures and
forms. Classify failures (inherent, induced,
false alarm, etc.).
Failure Review
Failure Review Board Determine failure trends. Prepare plans for
action. Identify failures to the root cause.
Failure Analysis
Reliability and/or
Problem Owners
Review operating procedures for error. Procure failed parts. Decide which parts will be
destructively analyzed. Perform failure analysis to determine root cause.
Quality
Inspect incoming test data for item.
Design
Redesign hardware/software, if necessary.
Sub-tier
Supplier
Prepare & provide new part or test procedure, if necessary.
Quality
Evaluate incoming test procedures. Inspect
redesigned hardware/software.
Reliability
Close loop by collecting and evaluating posttest data for recurrence of failure.
Corrective Action
Post-data Review
Description
Technology Transfer
#94042332A-GEN
2706 Montopolis Drive
Austin, Texas 78741
(512) 356–3500
Printed on Recycled Paper
Copyright 1993–1994, SEMATECH, Inc.
Download