.^ ;\ HD28 .M414 no. ALFRED A P. WORKING PAPER SLOAN SCHOOL OF MANAGEMENT Case-based System to Support Electronic Circuit Diagnosis Thilo Semmelbauer, Anantaram Balakrishnan, and Haruhiko Asada WP# 3497-92-MSA October, 1992 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASSACHUSETTS 02139 A Case-based System to Support Electronic Circuit Diagnosis Thilo Semmelbauer, Anantaram Balakrishnan, and Haruhiko Asada WP# 3497-92-MSA October, 1992 '^ J 7 1992 ' A Case-based System Support Electronic Circuit Diagnosis ^ to 7/7/70 Semmelbauer Leaders for Manufacturing M. T., Cambridge, MA I. Anantaram Balakrishnan Sloan Scfiool of Management M. T., Cambridge, MA I. Haruhiko Asada Department M. I. of T., Mechanical Engineering Cambndge, MA October 1992 Supported by MIT's Leaders for Manufacturing Program Abstract Diagnosis and repair operations have traditionally been major bottlenecks circuit in electronics assembly operations because they are labor intensive and highly vanable. Rapid technological advances, shrinking product lifecycles, and increasing competition in the electronics industry have made quick and and efficient diagnosis critical for cost control quality improvement, while simultaneously increasing the difficulty of the diagnosis task. This paper describes a case-based diagnosis support system to improve the effectiveness and efficiency of circuit diagnosis by automatically updating and the set of candidate defects, The case-based system prioritizing tests during the sequential testing process. stores individual diagnostic instances rather than general rules and algorithmic procedures; knowledge base grows as new faults are detected system provides distributed access features that make it quick to manufacturing applications. approach in and diagnosed by the analyzers. The to multiple users, and incorporates on-line updating adapt to changing circumstances. Because update, and integrate with existing systems, this We its method is it is easy to install, well-suited for real have implemented a prototype version, and tested the an actual electronics assembly environment. We describe the system's underlying principles, discuss methods to improve diagnostic effectiveness through principled test selection and sequencing, and discuss managerial implications for successful implementation. Introduction 1. Overview 1.1 Electronic circuit diagnosis the process of identifying the component(s) that is is responsible for the malfunction of a circuit board so that corrective (repair) action can be Subsequent exploration of the problem's root causes motivates process and product taken. improvement efforts. The diagnosis and repair loop has traditionally been a major bottleneck in circuit board assembly. Recent trends in the electronics industry such as the rapid technological advances, increasing product complexity, and shon product lifecycles, have considerably increased the imponance of quick and efficient diagnosis, while simultaneously increasing the difficulty of the diagnosis task. Reducing diagnosis time and variability, strive and improving diagnosis accuracy becomes critical as electronics towards lean manufacturing, quick response, and greater companies flexibility. This paper describes a case-based diagnosis support system that we developed consultation with a high-volume consumer electronics assembly plant. The in plant, faced with a rapid turnover of products and an increasing shonage of skilled technicians, wanted a means to improve workload on the efficiency and effectiveness of circuit diagnosis, reduce the few expert analyzers, and develop its An devoting significant system development resources. systems and current practice revealed based and model-based systems to that, new a tool to train although technicians, without informal survey of available the literature describes several rule- support diagnosis, these systems were not widely used by the electronics industry partly because they require considerable time and special expertise to install proposed It in this (e.g., for knowledge acquisition) and update. The case-based approach paper offers a practical solution to the diagnosis support needs of industry. accumulates knowledge on-line as analyzers detect and diagnose new faults. Using this information about prior diagnostic instances, the system eliminates candidate defects during the sequential testing process, and prioritizes prototype version of the system, and tested environment We describe the it subsequent in tests. We have implemented a an actual electronics assembly system's underlying principles, outline methods to improve diagnostic effectiveness by reducing the number ot tests needed to complete the diagnosis, and discuss implementation aspects. 1.2 Why is circuit Rapid changes diagnosis important? in electronics product and process technology, such as smaller, surface- mounted components, higher board circuit densities, and more complex circuitry, diagnosis more challenging. With fewer accessible connections and line functional testing have made test pads, in- can only identify the approximate area or block of components on the - 1- board that is responsible for the malfunction. Detailed diagnosis must be done off-line through extensive probing and measurement. Each component and connection on site or a circuit board is a potential defect site, and every can have several possible defects such as an open or short circuit (due to poor soldering component misalignment), component malfunction, inherent flaw placement of the wrong component, or defect rate of a in the board containing hundreds of components in the circuit design, The board's internal wiring. failure therefore, several orders of is, magnitude higher than the probability of a single defective component or operation, making testing and diagnosis inevitable even with highly capable placement and soldering operations. Isolating the root cause in such complex and dense consuming. Annual expenses directly related hundreds of thousands of dollars even in a to circuits is arduous and time diagnosis and repair can exceed several medium-sized facility. Work process in diagnosis stage and delayed feedback for process control further increase this cost. importandy, the time and effort required to diagnose a faulty board a few minutes to is at the More highly variable (from over 4 hours) depending on the type of defect, the analyzer's skill, the effectiveness of diagnostic tools, and the incentive and performance evaluation systems. Consequendy, medium The testing, fault diagnosis and repair operations are often major bottlenecks or high volume printed circuit board assembly cost and effort to train technicians electronics companies. A new is also potential root causes in training costs its facilities. becoming a significant concern for months of technician requires several experience with a particular board to understand the in training circuit's cause-effect manufacturing process. As product life and behavior and the cycles shrink, the must be amortized over smaller volumes for each board. Diagnosis bottlenecks are especially acute during the critical production ramp-up period following a new product introduction. Capturing market share early is essential for competitive survival in the electronics industry, but production facilities often cannot achieve the necessary rapid ramp-up because yields are low, and analyzers are least productive since they have not had significant experience with the new product. Fast and accurate diagnosis not only reduces direct manufacturing cost and flow time, but is also an essential prerequisite for successful quality improved product design (e.g., improvement efforts tiirough Soederberg [1992]), better process understanding and characterization, tighter process control, and close supplier pannership (e.g., [1991]). The ability to provide quick information feedback from the diagnosis stage to previous stages (including vendors) is particularly important in medium and environments. For instance, a snick glue pin or a worn placement head, early, Wenstrup can produce hundreds of defective boards. -2- if high volume not detected Computer support 1.3 for circuit diagnosis Manual diagnosis of circuit boards can be both inefficient and error-prone. Empirical evidence (Rouse and Morris [1985], Rouse and Hunt [1986]) suggests that humans are not optimaJ diagnosticians since they have difficulty data to prune the list in effectively utilizing all the of candidate defects and systematically select the other hand, building a completely automated system that diagnosing all is test available test sequence. On the capable of anticipating and possible defects can be prohibitively expensive; such a system might also require extensive time and effort by specialists (design and process engineers or software experts) to install and maintain. Therefore, a system that assists, rather than replaces, human diagnosticians is more practicable. To cope with the dynamic electronics environment, the system must be robust and flexible, i.e., it should easily adapt to design and process changes. Practitioners prefer a flexible system that possibly diagnoses only the most offers common defect types (that account for a vast majority of board failures) to one that comprehensive diagnostic coverage but cannot easily adapt The case-based approach knowledge-based systems, offers the potential to the case-based to changes. meet these requirements. Unlike other system accumulates knowledge on-line by storing diagnostic instances (as analyzers identify and diagnose new faults) rather than general rules or algorithmic procedures. Each diagnostic instance or case corresponds to a distinct defect; for that defect. it is represented by the set of tests conducted and their observed outcomes By comparing the test results for the current board with previous instances, the system can eliminate candidate defects and determine the effectiveness of subsequent tests. Recording diagnostic instances rather than general rules has the p)otential disadvantage of requiring a large database. However, our prototype implementation experience suggests that the vast majority of faults encountered very small fraction of the set of all in practice correspond to a possible instances. Because of its simplicity, the system can be maintained by the analyzers themselves, avoiding long knowledge-acquisition delays and the consequent system obsolescence. Furthermore, the standard format of a case promotes learning and information sharing among technicians (from different shifts or production lines), and enables the system to serve also as a training tool for system viable and useful for new real technicians. These features make the case-based manufacturing applications. Although case-based reasoning has been applied to other domains such as legal argument and customer service, we are not aware of its previous appHcation to electronic circuit diagnosis. We enhance the basic case-based approach to incorporate historical data and other prior information for systematically selecting tests and optimizing the search process. approach, we implemented and To validate the overall tested a prototype version of the system in a high-volume printed circuit board (surface-mount) assembly line producing a hybrid (analog-Kiigital) circuit board for a state-of-the-art consumer electronics product. Our implementation -3- permits distributed usage via a local access network. Within five weeks of installation, the system achieved 90% 90% diagnose to correctly coverage, i.e., of the defects occuring we implementation expenence, we propose week. Based on our in the fifth offer suggestions to diagnosis function; in particular, enough diagnostic instances the system had acquired improve the management of a proactive the management approach and incentive schemes that emphasize the analyzer's process improvement rather than fault- finding role. The rest of this paper organized as follows. Section 2 describes the circuit diagnosis is process in more detail, and develops the goals for an ideal diagnosis suppon system. Section 3 discusses the underlying structure and operation of a basic case-based system for verifying defects based on test outcomes. Section 4 presents system enhancements to optimize different defects. We sequencing problem heuristic using historical data and other prior information on the likelihood of test selection methods present a to dynamic programming formulation of minimize the expected number of tests for on-line test selection. Section 5 briefly implementation and the lessons we learned from this the optimal test per board, and outline descnbes our prototype implementation exercise, and identifies directions for further work. 2. 2.1 The Diagnosis Process In-line testing: Capabilities and tradeoffs Circuit board assembly lines typically contain in-line inspection and testing stages, after placement and soldering operations (see, for instance. Noble [1989] for a description of the processing steps in printed circuit board assembly). The main purpose of these in-line screening operations is to identify defective boards , although they also have limited diagnostic capabilities. For instance, visual and X-ray inspection can identify missing or misplaced components and possibly certain solder defects, but are not effective or reliable for dense boards with small components. Furthermore, complete manual or X-ray inspection can be very time consuming and expensive. Two types of in-line electrical — in-circuit testing In-circuit tests verify the functionality of individual testing. automated isolate tests are available test equipment (see, for instance. components on Maunder and Tulloss and precisely identify certain types of defects (e.g., [1990]). open or short passive components), but require hard fixturing and programming that for short production runs. is the board using They can circuits, defective difficult to justify Moreover, the increasing use of surface-mount technology and the high cost of board "real estate" make in-circuit testing infeasible or limit their diagnostic capabilities (Garcia-Rubio [1988]). In-circuit tests are also prone to (see, for and functional measurement errors example. Chevalier [1992]) especially for small, high performance components. -4- leading to false acceptance or rejection of boards. Functional tests measure the circuit's response signals. meet board's output (at the pons or edge connectors) to different combinations of input Functional tests can confirm whether the interactions between various components the specifications; typically, they can trace a board's failure to the responsible circuit, but not to a particular com]X)nent or connection in this circuit. Selecting the testing strategy for a product involves addressing tradeoffs between higher assembly cycle times (for boards) to conduct extensive in-line diagnostic tests all versus higher off-tine diagnosis effon (for defective boards) As have high resolution. the board defect rate decreases and this tradeoff increasingly favors the strategy We the in-line tests component density increases, is tiien performed off-line by next describe this off-line diagnosis process, and identify opportunities to support this process using a computer-based system. For brevity, use the word "diagnosis" to refer to the detailed, off-line diagnosis has failed in-line (functional) circuit of the malfunction procedure corrective action (e.g., for off-line board that has failed functional (e.g., we will procedure after a board tests. 2.2 Sequential testing Diagnosing a do not of using in-line testing only to identify defective boards; detailed diagnosis and defect verification skilled technicians. when diagnosis tests entails identifying tiie source a compxDnent or connection on the board) so that appropriate replacing or resoldering the component) can be taken to repair the board Diagnostic information also serves as the input for root cause analysis and continuous improvement of vendors, processes, and product design. We define a dfifg^l as the finest grain identified by the analyzer. defects might require the Each defect has an associated corrective action same problem In are different; therefore, defects even though both require the resisitor). Each defect has a by applying one or more set tests . ; different corrective action. For instance, a defective resistor can act as either a short circuit or an open circuit. identify the source of malfunction that can be unambiguously same each case, the symptoms and the means we view test is two possibilities as separate corrective action (namely, replacing the of associated A these to symptoms that the analyzer must discover an action that must be performed either by a technician or by automated test equipment, resulting in an observation about the behavior of the board. A test might consist of applying a specified set of electrical certain circuit elements (e.g., tuning capacitors), circuit test and probing selected locations on the board to measure the voltage or view the electrical signal on an oscilloscope. Each has two or more possible results or outcomes or less than 4.5 volts); observe that we inputs, adjusting discretize it if the output (e.g.. voltage measurement is is greater than 4.5 volts continuous (e.g., voltage) by defining the range of continuous values associated with each outcome. For convenience, and without loss of generality, -5- all of our subsequent discussions assume has two possible outcomes which that every test depends on the defect test We in the board. we denote or as 1 . The outcome of a permit defects to have indeterminate outcomes for certain tests, e.g., for a certain defect, the voltage at a particular location or 1 depending on, say, the state of a tlip-flop device. We may be either also recognize that the analyzer might have only partial prior information regarding which defect produces what outcome for each test; however, we assume that the available information adequate to uniquely is identify each defect. To identify the true defect in a malfunctioning board, the analyzer uses a sequential testing procedure the board's failure between these . The process mode during functional testing) and a set of The analyzer successively defects. observes the outcome with a candidate set of defects (which depends on starts at (i.e., each stage, and eliminates (from the candidate defect the set contains comprehensive (i.e., can distinguish and applies various selects defects that cannot produce the observed outcomes. complete tests that all If tests, set) all the initial candidate defect set possible defects), and the set of available tests some combination of known test outcomes uniquely is is identifies each candidate defect), the procedure terminates with the true defect as the only remaining member of the candidate defect In practice, analyzers might not always follow a systematic procedure eliminate defects. Diagnosis some intuition, circuit set. is to select tests and often considered an "art", and analyzers rely largely on knowledge, memory, and experience as they perform diagnosis. Often, they do not formally maintain and update the set of candidate defects by comparing the defects' expected test outcomes with the observed outcomes. may candidate defects and their expected outcomes vary from technician to technician. A common not even be documented, and might "myopic approach for troubleshooting " consists of hypothesizing a particular defect, attempting to verify appropriate tests, revising the guess if Indeed, the set of it by performing the the tests disprove the first hypothesis, and so on. Alternatively, the analyzer might follow a predetermined test sequence represented, for instance, as a fault tree. A purely manual diagnosis when it is process can be both inefficient and inaccurate especially not supported by clear procedures and documentation. For instance, using a static fault tree can result in unnecessary approach that uses recent defect history tests to and unproductive search (relative to an guide the search). Likewise, diagnosis without adequate documentation and formal recording procedures can lead to replication of false identification. Also, these informal transfering knowledge let us first and approaches are not conducive to learning, to other analyzers, or generating appropriate information for quality improvement. To identify the desirable characteristics of a computer-based system analyzers, tests to assist explore the available opponunities to improve diagnosis productivity. -6 Diagnosis decision support requirements 2.3 Consider an intermediate stage several tests and observed the results of each questions, (1) all Given the specific (2) Does (3) If aimed at when diagnosis process in the the analyzer has applied The analyzer now test. faces four related deciding what to do next: outcomes of all previous tests, can we conclude that the board contains a (known) defect? the board contain a previously unknown more than one (known) defect remain defect? which available as candidates, test(s) can distinguish between these defects? (4) We What test to apply next? two questions will refer to the first as Test as Defect Identification and the remaining two , Selection. Consider the information needs and actions for the defect identification questions An . affirmative answer to either question takes the board out of the regular diagnosis iteration. In the first case, the action consists of repairing the identified defect, possibly after confirming the diagnosis. outcomes (i.e., the answer If no known defect causes to question 2 is affirmative), board contains a new defect, or inaccurate, or (iii) the (ii) the actual test the assumed test observed cumulative we have outcomes outcomes were measured test three possibilities: for the known incorrectly. the (i) defects are In all tiiree simations, the board requires extraordinary actions such as confirming previous test outcomes, consulting with design and process engineers, checking with suppliers, and updating current knowledge. If multiple (known) defects remain in the candidate seek to identify relevant tests, no such test, if that test test is available, the to first this process corresponds to determining, for has different expected outcomes for the candidate defects. analyzer must develop and apply one or more distinguish the defects. Otherwise, perform next. As we the test selection questions not yet performed, that can distinguish between the remaining possible defects. Conceptually, each available set, we must decide which among shall see later, this decision new If tests to the available tests to impacts the number of iterations needed complete the diagnosis. Computer suppon logical reasoning for the defect identification questions system that automatically compares the observed expected outcomes for outcomes. Similarly, relevant tests at might consist of a database or all test outcomes with the candidate defects, and eliminates defects with inconsistent to support test selection, a computer-based system can identify the each stage, and apply a systematic method to select the next effective, the diagnosis support test. To be system must meet cenain key requirements such as ease of 7- initializing features of a computer-based system to • Development a system that • We and updating, and quick response. effort: suppon list below these and other desirable electronic circuit diagnosis: the short product lifecycles of electronic products, Given becomes operational quickly and without The system must be easy Maintainability/ robustness: to update, preferably (e.g., by the users software expens engineers) often introduces delays, causing inconsistencies and system obsolescence. Another important requirement to the frequent design • require significant expense. themselves; relying on external resources to update the system and knowledge we and process changes is the system's ability to adapt gracefully that characterize the electronics industry. Compatibility: Technicians often view structured tools and methodologies as impediments to creativity; compatibility with current practices organization is, therefore, critical to the success ergonomics, ease of use, flexibility, and manual and the culture of the and effectiveness of the system. System ovemde options are some of the ways to improve compatibility. • Response Since each board requires several time/efficiency: candidate defect set, the system does not become the botdeneck • Effectiveness: must respond rapidly test iterations to to these questions so that diagnosis activity. The main purpose of a diagnosis support system time per board. Part of this prune the savings comes from eliminating of the routine bookkeeping and lookup functions. Two is to errors reduce the diagnosis and automating some other features can reduce diagnosis time: • reducing the (expected) number of tests through systematic search and principled test selection and sequencing; and, • providing good "coverage", i.e., when the system has stabilized, it must correctly diagnose a large percentage of the defects. Other desirable system features include prepare periodic summary abilities to: interface with existing databases, reports for process improvement, highlight inconsistencies and faciUtate learning, diagnose both digital and analog (e.g., radio frequency) circuits, and operate in multi-user environments. The next section uses these performance criteria to motivate the case-based method. 3. 3.1 The Case-based Approach Knowledge base for Circuit Diagnosis for diagnostic systems Circuit diagnosis requires three broad classes of knowledge — historic, heuristic, and fundamental knowledge. Historic and other prior information on the likelihood of defects can help the analyzer prioritize candidate defects, thus enabling an effective choice of sequence. For example, if a placement machine frequentiy misplaces a particular 8- test part, the analyzer might first examine that pan before pert'orming other knowledge, sometimes termed "shallow" knowledge, refers tests. to Heuristic or empirical an empincal association between observed symptoms and diagnostic conclusions. This category might include rules and procedures, as well as effective after shoncuts that the analyzer has found tricks or to be some troubleshooting expenence. Fundamental or "deep" knowledge implies an understanding of the underlying physics of circuit elements which we use to predict circuit response. Design and test engineers rely on such fundamental knowledge, often encoded in a circuit simulator, to design and troubleshoot electronic circuits. Depending on the primary source of diagnostic knowledge, we can distiguish between experience- based (empirical knowledge) and model-based (fundamental knowledge) systems to support circuit diagnosis. The model-based approach uses an analytical model or a circuit simulator to dynamically identify the potential causes for the observed symptoms during (i.e., the sequential testing process. Since diagnosis entails "backtracking" given ceruin observed symptoms or outputs, what values of circuit elements can produce these outputs), we need a translator (that characterize the output for a conven to traditional circuit design models given circuit configuration) into corresponding diagnostic models. Davis and Hamscher [1988] illustrate the three basic steps generation, hypothesis testing, and hypothesis discrimination — in — hypothesis model-based diagnosis. Gensereth [1984], deKleer and Williams [1987], and Hamscher [1988] describe various model-based systems a pproach is to diagnose digital circuits. The most common experience -based a rule-based (expen) system that relies on the experience of develop if-then rules that enable the system to human reason about the behavior of malfunctioning board (see Semmelbauer [1992] for an illustration of this experts to tiie approach). of the pioneering apphcations of the rule-based approach was for medical diagnosis see Harmon and King (e.g., [1985]). Unlike the field of medicine, however, electronic circuits and manufacturing processes vary widely, and radical changes. One In general, the literature the technologies undergo frequent and on knowledge-based diagnostic systems focusses on knowledge representation issues and inference mechanisms to identify the underlying source of the problem, and does not adequately address issues of optimizing the sequential search process. For electronic circuit diagnosis, both the certain advantages, but also framework that impose overcome By limitations. combines both approaches respective strengths and model-based and rule-based approaches offer in a considering a decision suppxsrt complemetary fashion, we can exploit the disadvantages. Model-based systems offer their the advantage of direcdy integrating design and diagnosis: when the designer changes the circuit is schematic (and adds relevant sub-models for new components), the diagnosis model automatically updated, permitting instantaneous troubleshooting capability without requiring any human experience with the new circuit -9- (Davis [1988]). However, because they are very computationally intensive, model-based diagnostic systems are not wellsuited for on-line applications. Furthermore, technical difficulties suitable diagnostic models Sugaya [1986], Tong for analog circuits (see, for et al. still remain building in example, Bennetts [1981], Kritz and [1989]) due to bi-directional signal propagation, and the impact of physical layout and tolerances on the performance of high speed radio-frequency circuits. Dealing with bridge defects and improper (open) connections also poses problems since the basic circuit structure itself changes (Priest and Welham [ On 1990]). the other hand, the rule-based approach can incorporate heuristic knowledge to deal with these complexities, and provide quick on-line response during diagnosis; the infrastructure of tools to implement a rule-based system also quite well-established. is expert systems might require long development times (Priest and especially for initial knowledge acquisition (e.g., However, Welham traditional [1990]), Newstead and Pettipher [1986]). Furthermore, since each component and connection on the board can cause malfunction, anticipating and encoding rules for every possible defect instance, the initial covered less than knowledge 20% of we that weeks of system operation. almost impossible. For acquired for one board to build a prototype system the defect types that all is were encountered during the Finally, a system that relies engineers to update the knowledge base (for instance, first five on software experts or knowledge when the circuit design or manufacturing process changes) can introduce expensive delays. To cope with the rapid pace of change in the electronics industry, we require a diagnostic system that can both adapt easily to the changing environment, and reduce we diagnosis time in production by providing quick on-line response. Next, case-based approach that is similar to rule-based systems in its describe a on-line operation, but can exploit the capabilities of model-based systems to adapt to changing circuit designs and new defects. Elements of case-based reasoning 3.2 The goal of our project was to develop and test a viable in a practical manufacturing environment. diagnostic support system that Two is important observations regarding practice motivated our proposed case-based approach: 1. The ubiquitous 80-20 law of all applies also to electronic circuit diagnosis, possible defects occur in over 80% i.e., of the defective boards. Figure less than 1, 20% which shows the cumulative relative frequencies of various defects (arranged in decreasing order of occurence) for our prototype system, clearly 20 characteristic, a illustrates this phenomenon. Given this 80- system that can operate with incomplete knowledge progressively , adding defects as they are encountered, is preferable to enumerating, a priori, all possible defects. This strategy has four important advantages: (a) the developmental effort and time is likely to be significantly lower; (b) focusing on the relatively few defects that have -10- occured in the past can significantly reduce the response time; (c) since it can tolerate incomplete knowledge and can be updated on-line, the system can easily adapt to changes human expens process and product design; and (d) by relying on in defects, the system diagnose new non-threatening and hence likely to be well-accepted. is experience (reported to in Our Section 5) indicates that the strategy of progressively adding defects as they are encountered provides over 90% new coverage within just a few weeks of operation. 2. A simple way performed and defect. from an expert analyzer to "learn" their respective outcomes The observed outcomes board characterize the new for all — system maintainability, and as a means to Case-based reasoning (CBR) originated An "case"). the early system, HYPO, operated system combines reasoning about the it a convenient to clever searching area of legal argument (hence, the term in the common law domain of trade-secret law; statutes (rules) with the precedents (cases) that to the current situation of the case base on a number of significant example, Rissland and Skalak [1988]). Applications of CBR Lee and Liu [1988] apply Tokumaru [1990] for in the attributes. to robotic assembly CBR to The by the researcher. Legal argument continues to dominate the most recent case-based reasoning limited. new communicate among expens. "closest" cases with respect to these attributes are selected for perusal for the tests knowledge representation concern them. The problem of finding the case(s) appropriate amounts — as the analyzer explores and identifies a the tests that the analyzer applied to diagnose the making defect, by recording the actions is literature (see, the fault diagnosis are cell diagnosis, and Ishida and present a generalized approach to case-based diagnosis using the frame theory. For electronic circuit diagnosis, the case-based approach offers the attractive features of compatibility and incremental, on-hne tolerate incomplete knowledge, and its knowledge acquisition; the system can coverage increases with time. We enhance the conventional case-based approach by adding features to incorporate: (i) robust knowledge acquisition: the system identifies incomplete and inconsistent (ii) knowledge (e.g., verification and updating steps distributed usage: due to measurement errors or design changes), and incorporates to correct these errors; by implementing the system in a distributed client-server environment with a shared database, several users can simultaneously access and use the system; and, (iii) effective test sequencing: the system reduces the number of tests by applying test selection heuristics that account for prior probabilities of different defects. In the electronic circuit diagnosis context, a experience, represented by its test case outcomes. (A is rule, an instance of a diagnostic by contrast, is a generalization about diagnostic experience gathered over time.) Each case corresponds to a particular defect. - 11 - The case base consists of the collection of instances corresponding to different defects Recall that each defect has either a deterministic or indeterminate observed in the past. outcome for every test; funher, the value of a deterministic 1) we or unknown. For a given defect, signature of We visualize the case base as a n x that defect. diagnosed case or defect. The the i.e., expected outcome of test j of expected outcomes for m as all tests available tests. in the case this for defect i row has base represents the signature of the a 0, 1, 1, U or indeterminate, or unknown. is 0, 1, knowledge are detected and diagnosed (incremental i depending on whether the We user to dynamically add rows (cases) and columns (tests) to the case base as 3.3 (0 or m matrix, with each row corresponding to a previously row i element of j outcome might be known Suppose we have n defect types and the defect type, ^r to its set r. permit the new defects acquisition). Diagnosing a defective board: Successive refinement through case matching Diagnosing a defective board corresponds performing and observing tests their to gathering enough information (by outcomes) about the board's signature, while simultaneously searching the case base, to identify a matching case. found, the board contains a new If no matching case is defect or an inaccurate case. In either situation, the case base must be updated to reflect this latest experience. We remark that the general case- based approach goes beyond case matching and database updating functions to possibly include reasoning and extrapolation. capabilities. Figure 2 shows support system (CDSS). the We Our implementation does not exploit these additional complete flow chart for our case-based first explian to update the case base (Section 3.4). how circuit diagnosis the system operates before describing how Section 4 descnbes enhancements to improve the system's test sequencing capabilities, and Section 5 discusses how to integrate this system with a model-based approach. At each stage of consisting of tests the search process, the system maintains: known all performed thus defects far, and (ii) whose (i) a candidate defect set signatures match the observed outcomes for the available test set containing all tests that all the have not yet been perfonned but whose outcome differentiates one or more candidate defects from the remaining defects in the analyzer chooses a outcome of next test). test candidate from In the basic version of the case- based system, the the available test set, performs this test, and observes the the test. (In Section When set. 4 we describe system enhancements the analyzer enters the observed outcome to optimally select the into the system, the candidate defect set and available test set are updated as follows: • remove from test differs for this test the candidate defect set all defects from the observed outcome. Note is unknown or indeterminate; 12- whose expected outcome that we retain all defects for the latest whose outcome • delete the test that whose outcome is performed from the available was just the same (or in the updated candidate defect If the initial case base is unknown or indeterminate) for all remove any test the remaining defects set. comprehensive (i.e., contains enough information regarding expected outcomes accurate, the process test set; also, must terminate with possible defect types, and has all to uniquely identify each defect) and the candidate defect set containing only the true defect. To illustrate this process, Figure consider the simple 5-component, series network shown Suppose we have a board whose abnormal output signal for 3. indicates malfunction in board are good); for i = one of 1,2 the 5 5, components defect i the functional test the intermediate connections (all refers to a malfunction in component on the We i. have four available tests that apply a stimulus at the input terminal, and measure the voltage each of the four intermediate connections (see Figure abnormal voltage, indicates malfunction of one of to An outcome 3). in at of 0, corresponding we upstream components. Thus, the can represent the signature for each defect as a vector containing 4 binary elements. Defect 1 has the signature shows {0000}, defect 2 has signature {I the case base containing the complete signatures for does not have any indeterminate outcomes, and (i.e., he test 3, 3 first applies test 4 and so on (i.e., component answer the defect At the that the technician (i.e., initially Table 5 defects. This 1 example assume complete information end of always selects the test 3 is defective). identification and test set at test selection each stage; the columns of questions of Section defects { 1, 2, i.e., this table 2. the third iteration, the candidate defect set contains a single defect (defect sequence eliminates from the available (for this then 5), Table 2 traces the successive states of the system, and the available just applied) at each step outcome sequence, conclusively identified. Suppose a board contains defect proving conclusively that component 3 on the board chosen tests in reverse measures the voltage between components 4 and until the defect is the candidate defect set 3), we all so on. no "unknown" outcomes). For simplicity, assume i.e., 000), and However, if we had test set first same outcome only one applied example) automatically eliminates 3} have the defective. For this example, the is test test 3 4 since test (the test that instead of test 4. all was its "0" remaining candidate "0" for test 4. This example motivates the following three imponant observations: 1. Exploiting prior information: Suppose past history or a priori information (e.g., problems) suggests that the likelihood of defect 3 defects. Since tests 2 and 3 are adequate knowledge about vendor quality is significantly higher than the other to isolate defect 3, the analyzer -13- might choose to perform these two tests first. would require only two For our sample board containing defect 3, this complete the diagnosis (compared to the iterations to strategy 3 iterations required for the static reverse-order sequence). For complex circuits, this savings in number of tests how can be significant. Section 4 discusses other prior knowledge to dynamically select an effective 2. to exploit historical and sequence. test Incomplete knowledg e: Our example assumes know the expected UU {0 U} outcome) (U is if U} 1 outcome of each we know the complete signature for each defect, 3. (the 1 element "U" denotes an unknown produce a "1" outcome for case base and our original reverse outcome not eliminate defect 1 same (shown as before in Table we have one outcome of test 2 for defect available test (test 1 (in this case, the expected 1) that we outcomes process and reduce the number of the partial signatures, particular, However, two defects (defects can eliminate defect will observe a "1" defect Also note tests. I's 1. We 1 at and must outcome, eliminating defect Thus, complete 3. for each defect can accelerate the pruning that, we can improve our knowledge we can change unknown). Therefore, is 1 order to conclusively prove that the board contains defect knowledge about 2). for test 2 during the third iteration eliminates defect 2 but does (since the necessarily perform test 1) in similarly, sequence, the set of candidate defects test the start of the fourth iteration, the candidate defect set contains and 1; between the 5 defects. for the first 3 iterations remain the 3), test Table 3 shows an "incomplete" case base containing adequate to distinguish observing a "1" we 2 and 3 are the only essential tests to uniquely tests i.e., i.e., every defect. However, the partial signature test for that all other defects isolates defect 3, knowledge this we know sufficient to isolate defect identify defect Using that even though we staned with only after this diagnostic exercise. expected outcome In for test 2 from "U" to "0" in the case base, potentially improving future performance. 3. Multiple defects: The signature of a defect represents the anticipated test outcomes assuming that the board contains only that defect. in What happens if a board contains multiple defects? Suppose, our example, the board contained defects 2, is still correct, i.e., replace component 3, component we must functioning of the board. 3 is 3 and 1. The However, defective. again apply the functional If the diagnosis, illustrated in Table we after test to verify defect 3 and confum proper board malfunctions, the diagnosis process must be repeated; in this example, the second round will detect defect Although our example used a very simple circuit, it has served to illustrate several generic concepts relating to effective diagnosis. Real circuits contain feedback loops, bidirectional signals, tests that - 14 1. many complexities such as probe the same location but with multiple measurements, stimuli, and component to these complex and so on. Our methodology applies even settings, circuits. knowledge and 3.4 Updating the case base: Incomplete inaccuracies We have already seen one opportunity performance: suppose we identify defect i to update the case base and improve subsequent end of the diagnosis process, and suppose at the the current case base contains only the partial signature for defect performed during the current diagnosis whose outcome unknown, we can replace outcome defect the unknown We updating step j known is to is augment defect i's if j was previously have a deterministic outcome for will refer to this updating step as signature not valid i test entry in the case base with the actual observed for the current board (assuming test i). for defect Then, for every i. augmentation the board contains other defects besides defect signature only after repairing it Note . i; that this hence, we can and verifying the proper functioning of the board. Our discussion covers all in Section 3.4 assumed that the case base possible defect types) and accurate. If is the case base comprehensive is (i.e., it incomplete or inaccurate, the matching and successive defect/test elimination process might not necessarily terminate with a single (truej -efect in the candidate defect (i) The process terminates because of observed test set. There are two the candidate defect set outcomes does not match with is empty the signature of possibilities: , i.e., the combination any known defect. This situation could arise either because: (a) the board contains a new defect (b) at least that one of the defect signatures recorded data entry errors, design changes, (c) the had not occured previously; in the case base is inaccurate (due to etc.); or, observation and measurement of the acaial test results was erroneous. In all three cases, the board requires "extraordinary" actions to identify the true symptoms and cause of the problem; these actions might entail detailed probing and experimentation by the analyzer as well as consultations with design and production staff, and component vendors. defect, its signature in the case base If the board new defect, we must add refer to this step as case addition for that defect, . found a new . On if engineers, known to contain a must be checked and corrected, refer to this ujxiating step as signature correction contains a is test necessary. the other hand, if the board case (row) corresponding to that defect. The new case contains the expected test who diagnosed defect, the defect or revised the signature, the corrective (repair) action required, the defect category, and who to notify (or how to use) for quality improvement. Although the existing tests suffice to distinguish this -15- We outcomes and possibly some auxiliary information such as the name of the the analyzer or engineer We new the to (ii) defect from the previously empty defect set known defects (the observed test provide adequate "proof for add one or more new The diagnosis process this defect), the can quickly isolate tests that outcomes that led to analyzer might wish this defect. terminates because the available test set is empty but the , candidate defect set contains more than one possible defect. This occurrence indicates erroneous test "Inadequate measurements, inaccurate defect signatures, or inadequate tests" refers to the situation between the candidate defects tests are required. perform that test or if its delete a test expected outcome indeterminate or unknown) for all is from test the outcomes are unknown the available test set same (same known in the or new when we value, the remaining candidate defects. Again, this stopping condition (with multiple candidate defects and no remaining extraordinary actions to verify the tests. the current tests cannot distinguish many either because we Recall that when tests) requires measurements, check the defect signatures stored test case base, complete the partial signatures of the remaining candidate defects, or add new tests that distinguish between these defects. We refer to this process as test addition. Figure 2 contains a flow chart summarizing the case matching and updating process. This process includes a defect verification step at the end of a successful diagnosis exercise. Verification entails confirming previous observed outcomes and applying any tests that remain in the available defect, a new test set; if one or more outcomes conflict with the hypothesized case must be added. This optional step is especially important after design changes and during production ramp-up when many new cases are added During normal operation, defects are automatically verified defect, the board passes the functional test. A to the case base. after repairing the identified if, board failing the defect verification to determine whether the original diagnosis test again requires explicit was erroneous or if the board contains multiple defects. In summary, robust, the case-based approach to support circuit diagnosis permits incremental, and distributed knowledge the case-based system knowledge symptoms defects. update. acquisition . Unhke other knowledge- based approaches, employs on-line and incremental, acquisition. All cases have the rather than batch-oriented, same format; a case contains information on that the user identifies to be relevant in defining Because of the standardized format, the case base the and distinguishing between is easy to understand and The system begins with incomplete knowledge and poor defect coverage, but each unsuccessful diagnosis initiates a process of case-building to ensure subsequent coverage of the same defect. Our experience suggests that the case coverage increases very rapidly during the first few weeks. By only maintaining the diagnostic knowledge that is needed, the system can provide quick resix)nse. Incompleteness and inconsistencies in the case 16 base are easy to detect: they result in unusual termination of the diagnosis process (empty candidate defect set or empty available — steps make The various correction and updating test set). signature augmentation, signature correction, case addition, and test addition the knowledge acquisition process robust and adaptive. The case-based system can be implemented using a shared relational database in a multi-user computing environment, thus providing wide access to analyzers at different locations as well as design, test and process engineers and production supervisors. This implementation also allows the case base to be integrated with other factory information systems, conveying current vendor, design, and process problems to the diagnosis operation, and feeding back defect information for quality improvement. The system can also keep track of the cumulative relative frequencies and trends of different defects; we next describe how relative frequencies to guide 4. and optimize the sequential to use these testing process. Optimal Test Selection By automating each the search and updating functions, the basic iteration of the diagnosis process. another means to significantiy reduce the Section 3 illustrates, the CDSS reduces the time for Decreasing the number of iterations provides diagnosis time per board. total number of iterations depends on As our example of the test sequence. We, therefore, require an effective test selection policy or rule: at each stage of the diagnosis process, the method helps us decide which thus We refer to far). available test to apply next (based on the observed outcomes minimizes the expected number of a test selection rule that the optimal test selection policy. This section presents a tests as dynamic programming formulation for the optimal test selection problems, and describes on-line heuristic rules that the basic CDSS can incorporate to reduce the expected number of tests. These rules rely on prior information concerning the likelihood of different defect types, estimated from historical data on defects (relative frequencies and trends), knowledge about specific problems at the vendor site changes, 4.1 or in the assembly process, inputs from design engineers on new engineering etc. Test selection example We first consider a simple example sequencing on the number of to better understand the impact of that defect, i.e., test outcome for all other defects (or vice versa). test (test 1 in Figure 3 is and prove the defect. Suppose a board contains tests required to one of n potential defects, and each defect has a single associated and sufficient to "prove" test structure i test that is has a "1" outcome for defect We i, and a "0" will refer to this type of test as n focused a focused test). Thus, the complete case base for n focused corresponds to a n x n identity matrix. Suppose n = 5, and the five possible defects the following probabilities of occurence: 0.1, 0.1, 0.2, 0.2, and 0.4. the tests in order of decreasing defect probabilities 17 both necessary tests have Intuitively, applying minimizes the expected number of tests. For our example, the "decreasing probability sequence" corresponds reverse index order tests, say Ej, using this strategy Ej On 1 ) until the defect is identified. to applying the tests in The expected number of is: = 1x0.4 + 2x0.2 + 3x0.2 + 4x0.1+5x0.1 = 2.3 tests per board. the other hand, for a static policy that applies tests in the forward sequence 1-2-3-4-5. the expected number of tests E2 E2 The optimal a 5-4-3-2- (i.e., 43% 1x0.1+2x0.1+3x0.2 + 4x0.2 + 5x0.4 = 3.7 tests per board. strategy saves, savings. Thus, optimal is: = on average, (A more skewed probability test for quality Because we assumed that optimal testing policy that easy to implement, each defect has a focused is static (i.e., tests in measurement, etc.) and eliminate only one defect tests jointly identify Joint tests reduce the test, our example has a simple (because of constraints on probing, at a analyzers rely on a combination of focused a.nd joint 3). reduce diagnosis cost, and decreasing order of defect probabilities. tests are infeasible for certain defects combined outcomes can to does not vary with the observed outcomes) and Since focused both necessary and sufficient, joint second sequence, improvement. sequence the i.e., to the distribution can give even greater savings.) sequencing has considerable potential provide quick feedback Figure board relative 1.4 tests per time (except tests. at the final step), Unlike a focused cannot isolate defects individually but their and prove the defect number of required under certain circumstances (depending on tests to their structure (e.g., tests diagnose 2 through 4 in all defects, The configuration problem" of deciding what combination of focused and joint paper, assuming instead that the available tests are predetermined. set contains joint tests, the test to previous 4.2 test outcomes; test selection optimal we test selection next formulate this tests this tradeoff in this When the available test policy might be dynamic, i.e., the decision problem. problem seeks an optimal or near-optimal tests between dynamic and regarding previous test selection Problem formulation and optimal solution minimizes the expected number of We distinguish include apply next depends on the current "state" of the system defined by the Test selection: The "test tests to poses challenging tradeoffs between the number of available and the expected number of iterations per board; we do not address on what and can, and the relative likelihood of various defects), reduce the expected number of iterations per board. in the available test set test that is test outcomes same sequence regardless of the or iterations to diagnose a malfunctioning board. static policies; a dynamic policy uses information to select the next test, test testing policy that while a static policy selects the outcomes. Dynamic policies perform bener (i.e., require fewer tests on average) since they use more information. The optimization of -18- they sequenrial search processes has been previously modeled (see, for example, Whinston and Moore [1986]), and applied, for instance, to computer Whinston [1988]). The search (Moore, Richmond, and file has also addressed optimal reliability literature sequencing test Binney issues for certain special circuit and test structures (e.g., Nachlas, Loney, and [1990], Butler and Lieberman [1984] study series systems with focused unlike our dynamic knowledge acquisition context, that all defect types and without provisions for their probabilities are unknown this However, tests). stream of literature often assumes known, and the requisite tests are available, or indeterminate outcomes. Furthermore, the knowledge- based diagnosis systems described in the literature do not appear to incorporate optimal search principles. Dynamic Programming Formulation 4.2.1 This section presents a standard dynamic programming formulation to clarify the ingredients and tradeoffs in test selection during circuit diagnosis. First, some We notation. m available types and contains use the indices We tests. = i = j 1,2,. ..,m to introduce represent the n defect assume, without loss of generality, that the case base we can possible defect types. Otherwise, all and 1,2,. ..,n we introduce a dummy defect type called "1" "unidentified" (which includes "no trouble found"), and a fictitious focused test with a outcome if the board has an unidentified defect, and a "0" defect types (we perform this fictitious test only the set of partial signature for combinations) the expected deterministic unambiguously identify to all other known However, the defects is known each defect must be accurate, and we must have enough information must known the analyzer for known candidate We permit incomplete knowledge of defect signatures. empty). (i.e., when outcome the true defect. outcomes We for enough assume also defect-test that the test observations and measurements are error-free. These assumptions ensure that the diagnosis process set containing is accurate, and always terminates "properly" with the candidate defect only the true defect. At any intermediate for the tests defects iteration of the sequential search process, we have performed are consistent with the observed outcomes) can distinguish between the remaining defacts); hence, outcomes completely capture our current knowledge about state vector representing this current knowledge. We specify the m is the number of initially available tests): three values: = Xj = 1 if test j's state vector X^ if test j was previously performed and outcome was has elements " 1 x^' ", = and N X: N = for all j if test j = state X. Initially, D(X^) = ( l,2,...,n) is the set -19- all previously observed We let X denote the state X as the following tiiej its element x. can take observed outcome was "0", initial Let D(X) and T(X) denote, and available of and available has not yet been applied. The l,2,...,m. respectively, the index sets of candidate defects tiie (i.e., die board. m-vector (where X: observed outcomes thus far determine the remaining candidate defects whose expected outcomes tests (that tiie tests corresponding possible defects, and to T(X^ = any { the set of all available tests. Starting with this initial state, the iterations of 1,2,. ..,m} is moving from one the diagnosis process correspond to terminal state X whose intermediate state X, and update the 1), we state select and apply a vector (change The expected number of tests p j > this section, we will contains defect i. j ) reach a contains only one defect. At any € T(X), observe N from X: and the assume denote the prior probability test we state to the next until its outcome O: (= or to O;). complete the diagnosis of a defective board depends on to the probability of different defects, remainder of D(X candidate defect set testing policy that that the during the previous month; alternatively, set P| board contains only one defect. Let equal to the relative frequency of defect we might probability based on qualitative information the of the diagnosis process) that the board (at the start For instance, we might we choose. For (e.g., i subjectively estimate this prior new operators on the problems line, at the vendors' facility). Since, by our definitions and assumptions, the n defect types are mutually exclusive and collectively exhaustive, the prior probabilities sum to 1, i.e., n Ip° 1=1 As we apply tests = (4.1) 1. ' and eliminate defects from the candidate defect set, we must update the defect probabilities to reflect the test outcomes. Let Pi(X) represent the conditional probability that the board contains defect defect i that probability, i given that we are currently in state D(X) must have zero X Any . does not belong to the candidate defect set and the conditional probabilities remaining defects must sum for all conditional to 1. Thus, p.(X) = ifi«« D(X), and 1 = I p^) heD(X) h p°/{ • At any intermediate (non-terminal) next test only depends on the current remaining must choose tests until the the test state; the j at minimum cost-to-go each subsequent state, if we e T(X) are currentiy at state X, the optimal test that minimizes the expected number of diagnosis process terminates. These feaaires suggest die C*(X) from we state optimal X test sequencing problem. as the expected select the next test "optimally". the index of the "optimal" test to apply this (4.2) exact sequence of previous tests that led to following dynamic programming formulation for the the ie D(X). stage of the diagnosis process, our choice of the the current state is irrelevant Therefore, selection policy if if we are in state X. test. -20 We define number of remaining tests if, Let j*(X) e T(X) represent Let us describe how to identify For each perform test test j € T(X), j Applying . C(X j) denote let test gives two possible outcomes O. = j different next state (obtained by changing respectively). Each of these two next from x^ states has can, therefore, express C(X,j) in terms of these purpose, T(X) let at state us define k(X,Oj), for Oj = X results in outcome or 1, and their represent the respective subsets of defects in current state X j and the additional information, indeterminate is are 0, test j; we omit we assume that S = • •' 16 probabilities, = 1 X outcome when it (these subsets and j for j e test in depend on simplicity). Z P:(X) + 0.5 * the Without j is unknown or is: (4.3) P:(X). i€D!uDU outcome for test j Z + 0.5* X is: applied at state (4.4) Pi(X). leDIuDU we can compute the cosi-to-go C(X,j) as follows: test j ) denote, respectively, the next states (with e T(X) or is I. The first number of tests that are performed), cosi-to-go from the two possible next states. cost-to-go C*(X) from test that X applied at state is minimum is 1, DU, and DI +rt(X,Oj=0)C*(Xu(Oj=0}) + 7r(X,Oj=l)C*(Xu{Oj=l)), outcome O: of function counts the X or to D(X) whose expected outcomes (recorded term state X as the achieves this C*(X) = j*(X) = a terminal state, i.e., additional tests; hence, the minimum Min { is test last j (recall that or then compute the j e T(X) ), 1) on the our cost two terms contain the optimal decision j*(X), the minimum j e i.e., and (4.6) (4.7) c(X,j). D(X) contains exactly one =0 We = (4.5) value of C(X,j) over aU available tests C(X,j): argmin T(X) J 6 minimum C*(X) minimum and the x. the constant I) (i.e., right-hand side of equation (4.5) represents the "cost" of applying If to a depends on the conditional expected outcomes. Let DO, DI, DI where Xu{O;=0) and Xu{Oj=l T(X); the X as the probability that applying test any defect whose outcome for Zp,(X) = j:(X,0,= 1) the each leading in state we if minimum cost-to-go. We subsequent minimum costs-to-go. For this leDO Similarly, the likelihood of a "1" when X equally likely to give a "0" or "1" outcome. Thus, 7r(X,0;= 0), the K(X,O:=0) C(X,j) N 1, state an associated arguments the or current value unknown, and indeterminate 1, probability that test j will give a "0" Using these its O;. This probability probabilities of the defects in D(X), the case base) for test from the current the cost-to- go defect, then we do not require any cost-to-go if -21 ID(X)I = 1, (4.8) where iSI number of elements represents the dynamic programming recursive equations Equations (4.6) and (4.8) are the in set S. to compute minimum the cost-to-go and the optimal decision for every state X. Dynamic Programming Procedure: • Initialization: For every terminal Initialize: • X, state Labeled_states <— C*(X) = set set of all 0. terminal states. Iterative step: For every state X whose next belong states all to Labeled_states, use equation (4.5) to compute C(X,j) for every test j e T(X); use equation (4.6) to compute C*(X); equation (5) gives the optimal decision j*(X) add Repeat At every X state X; to Labeled_states; iterative step until all states are labeled. iterative step of this solution procedure, whose next at state states Xu{Oi=0) and Xu{0.= l }, we can find at least for every next test j one "unlabeled" € T(X), are (otherwise, the state transition diagram must contain a directed cycle, which The dynamic programming formulation can (e.g., minimize the expected precedence constraints easily total testing "cost" (e.g., test j must precede is all state labeled impossible). accomodate other objective functions when tests test due j' have different costs) as well as to technological or operational constraints) and probabilistic test outcomes. We can represent the optimal testing policy algorithm as a "look-up" table; found by the dynamic programming this table specifies the best next test j*(X) for every possible I, state X. Let us illustrate how represent the state at the end of the k not a terminal test j* state. e T(K^) for the state vector by At setting the x^'-^ apply we this test, = is use the look-up table to identify the optimal next and observe element equal j and iteration of the diagnosis process, iteration (k+l), state x'', X suppose X the analyzer might use this table during diagnosis. Let to the its outcome 0.». observed outcome, \\ ifj^tj*, and Oj. ifj=j*. We then update i.e., (4.9) Eliminating the candidate defects that have an expected (known) outcome different from O.* gives the new, smaller candidate defect set D(X'''^b corresponding to the If D(X'''^^) repeat tiie contains only one defect, the diagnosis process above process for the new to create the decisition table. By state. is new state complete. Otherwise, The dynamic program can be solved storing this table in a "*" test selection functions. . we "off-line" computer database, the case-based diagnosis support system can automate the look-up and -22 X Although the dynamic programming procedure line, it is computationally intensive for complex circuits that have a large number of available tests. Furthermore, We, probabilities Pj change. next test easy to state and can be solved off- is at we must re-solve the problem every time the prior therefore, consider some on-line heuristic rules to select the each iteration of the diagnosis process. 4.3 Heuristic Decision Rules Easiest to implement are on-line rules that dynamically select the next test at each stage (instead of referring to a look-up table) based on simple computations using information about the current Intuitively, the state. on-hne rule must select a We list below the candidate defect set. three out of many test that can rapidly prune possible rules to achieve this objective. Highest failure rule: This rule selects the least test with the highest probability of failure, where probable outcome of a optimal strategy for our example test that in Section 4. 1 Recall that test patterned after the select the "focused" proves the defect by failing). permit indeterminate outcomes as well as incomplete knowledge of we defects regardless of the test's test unknown with the available test with the largest we We might enhance prefer a test for which the unknown or indeterminate outcomes remain in the candidate set actual outcome. Hence, to rapidly prune the candidate defect the remaining candidate defects). probabilities (e.g., A relatively ineffective since these defects is set, this rule selects defects with is define the rule: certain (deterministic) test outcomes. many rule where we always proves the most likely candidate defect (the Most known outcomes for mode. This test as its "failure" we sum or indeterminate outcomes is number of known outcomes this rule (for by incorporating defect of probabilities of all candidate the smallest). Smallest expected remaining defects rule: To rapidly home in smallest expected outcomes O: = the true defect, this rule selects the available test number of remaining and remaining defects unknown on if 1, we defects (which is the with the j sum, over the two of the probability of obtaining outcome O. times the number of observe outcome 0:). Observe that a test that has many or indeterminate outcomes (see previous rule) will likely have a large of remaining defects for either outcome. Again, we can modify this rule to number account for the "weight" or probability of the remaining defects. We might enhance these myopic rules to incorporate "look-ahead" features, consider the likely states two or more iterations hence. Alternatively, to we might consider "static" rules that priortize the tests before staning the diagnosis process; at 23 i.e., every stage, the available test with the highest priority applied next. Unlike the dynamic programming is procedure, these heuristic rules do not guarantee optimality, might require more on average than tests i.e., in the long run, they the optimal policy. Their relative effectiveness depends on the nature and probability distnbution of various defects, the (e.g., focussed versus joint To smdy tests), and the level of knowledge regarding the relative effectiveness of heuristic rules, extensive simulation study. The study applied test characteristics test outcomes. Semmelbauer [1992] conducted an selected heuristic rules to many random problem instances with varying characteristics such as the number of possible defect the number of results), available tests, the "density" of the case base and the shape of the defect probability distribution commonly occuring corresponds to a few many defects and (i.e., the fraction of (e.g., a skewed all defects). diagnosis environment, the simulation generates several problem instances heuristic rule. number of tests tests for each rule. distribution For each (i.e., defective required to complete the diagnosis using each The average number of tests over expected number of test "rare" defects, while an uniform distribution implies approximately equal likelihood for boards) and records the types, known A these random instances estimates the naive rule that randomly selects the next each stage provides a benchmark for comparison. We test at summarize some important conclusions from these simulations (for more details, see Semmelbauer [1992]): • Principled test sequencing can give significant savings: most known rules (such as the the • number of tests performed) relative (i.e., to Problem size probabilities random test 30% (in sequencing. as the density of the case base increases. A fewer unknown and indeterminate outcomes) eliminates more defects at each stage since • relatively simple heuristic outcomes rule) gave savings of approximately The number of tests performed decreases denser case base Even it contains more information. (number of defect types) and the skewness of do not appear to the distribution of defect have a significant impact on the relative performance of different heuristics. Next, we discuss some lessons we learned by implementing a prototype of the case-based circuit diagnosis system. 5. 5.1 Prototype Implementation and Future Directions Features of CDSS prototype We implemented and tested a prototype of the Circuit Diagnosis Support System (CDSS) in a production environment to diagnose an acaial pnnted circuit board. The primary purpose of the implementation was to prove the concept, and understand human issues associated with introducing computer support -24 tools in a predominantly manual The prototype was diagnosis environment. computers at the Omnis written in 5 for Macintosh mice, and keyboards as input devices. The Omnis program communicates the facility. This central database stores machine serving base, p)ermitting multi-user access via a local network. via an (ORACLE) on (Standard Query Language) interface with a relational database UNIX SE technician stations; these terminals use touch-screens, bar-code readers, SQL the central and manipulates the case The prototype system was not integrated with the factory information system, and used only a simple decision rule (the most known outcomes rule) to select the next test at each stage. Also, the prototype did not incorporate the signature augmentation feature described in Section [1992] describes the prototype implementation in 3. Semmelbauer greater detail. Observations and feedback from users over a five-week period led to several system improvements. The Figure 4, that final version of the system employs a user-friendly interface, shown displays matching cases (the candidate defect outcomes, and permits technicians One the candidate defect set. to assess the usefulness set), and previous tests in and of each available test in pruning of the important system features is an user-specified option to override the automatic test selection procedure; using this option, technicians can choose the next test manually approach, its from the list of available tests. The simplicity of the case-based transparent logic and operation, and the manual override option greatly facilitated the introduction The system was tested and acceptance of the system in the existing environment. on a double-sided board containing approximately 400 components, half analog (radio frequency) and half digital. Only part of the board was supported by the Over CDSS; the test circuit contains approximately the course of five weeks, 70 analog components. 467 boards were diagnosed using the prototype system; these boards had 87 different defect types (involving 54 components, multiple defect types). We estimate that the average from about 25 minutes p>er with CDSS is assistance. (This estimate is based on an informal observation for random cases to be searched. In practice, higher when we found weeks of case if potential disadvantage of the case-based the case base contains a very large that the number of response time was not adversely addition. For situations that require a significantly number of cases, we might consider limiting the number of cases that are stored; the case base reaches this limit, an old case (with the smallest current probability of occurence) must be discarded in One the degradation in performance affected even after five time to diagnose each board dropped board for purely manual diagnosis to approximately 10 minutes boards rather than a rigorous time study.) approach some of which had response time per the case base if in test iteration order to add a new case. This strategy trades off the gain against the additional effon needed to re-leam and update an old defect occurs again. 25- To evaluate the system's learning each week of the during a week that Figure 5 shows period. trial is how technicians felt Coverage is fifth defined as the proportion of defects occuring initially contained cases corresponding were most Ukely or significant (based week, coverage increased to 85% 90%. rate 5.2 knowledge can be quite inadequate makes it 17% relies on its management support. To dunng the first test circuit; week of The CDSS's rapid learning dynamic electronics assembly environment. exploiting potential requires its full rapidly develop full defect coverage, the system regular use by experienced technicians as they detect and diagnose many as with to use the system because they beUeve that they have the requires new defects. knowledge-based systems, experienced technicians are less likely However, phenomenon Figure 5 of the defect types, suggesting that for this application. especially attractive in today's As their experience). Interestingly, Managing the diagnosis function Although the CDSS is effective and easy to use, appropriate on its to defects that the coverage was achieved for the operation, the system could only diagnose about initial weeks of the system's coverage increased during the first five shows, after 3 weeks of using the CDSS, by the the coverage of the system during correctly diagnosed using the current information in the case base. The case base operation. we measured rate, some fundamental changes in the least to learn. way Overcoming this the diagnosis function is managed. Technicians have traditionally been managed as hourly production workers despite their important role in tiie quality improvement loop and tiieir high skill level. Emphasizing their contributions to defect prevention and process improvement, rather than viewing diagnosis as their sole function, can produce rapid progress towards worid-class manufacturing standards. This shift in emphasis from fault finding to proactive process improvement requires changing the technicians' performance evaluation metrics, providing appropriate incentives, sharing. initial 1. and creating an environment that is conducive to learning and information Based on our experience with the prototype system, we suggest steps to implement these changes and exploit the following the technicians' skills: Change the technician's job description and expectations: Technicians' first priority must be defect prevention and process improvement. They are among the most skilled workers on the line, and are often the fu-st to identify defects. Thus, they are better suited than the design or process engineers (who are typically less concerned with dayto-day operations) to respond quickly to production problems. 2. Measure quality as well as quantity: Emphasizing the quality of diagnosis and the consequent improvements and learning can create a productive culture. Traditional productivity metrics such as the number of boards diagnosed per month often increase the variability in diagnosis times by prompting technicians to work on the "easy" boards first. -26 3. Introduce technical supervision through master technician. Traditionally, technicians report to the production or line supervisor who often does not have detailed technical knowledge about circuit behavior. Appointing a master-technician to supervise the diagnosi^repair op)erarion and provide consulting help to the technicians can greatly improve productivity, especially in an environment containing multiple assembly lines. 4. Interact closely with product development. Closer cooperation with product development might be encouraged, for instance, through a rotation program or a career path firom technician to development engineer. We believe that electronics companies will be naturally driven towards these changes in order to cope with increasing product diversity and small-lot production. Future directions 5.3 Our prototype implementation can be features that tests to we described in Sections 3 and have more than two outcomes, expUcit feature, (iii) enhanced easily (ii) 4. to incorporate several These enhancements include: (i) system permitting providing defect validation and verification as an incorporating the automatic case base updating mechanisms (e.g., signature augmentation), (iv) integrating fully with existing factory control and information systems, (v) collecting actual test measurements (continuous values) and automatically inferring the corresponding test outcomes, (vi) permitting (vii) more than one minimizing the expected testing cost per board when tests have different costs, accounting for precedence constraints and sequence-dependency incorporating more sophisticated test (i.e., among (viii) tests, (ix) sequencing methods based on defect probabilities, permitting multiple signatures for the measurements defect per board, same defect, and (xi) (x) adjusting for error-prone permitting tests with false positive and false negative outcomes; see. for example, Nachlas et al. [1990]). These enhancements are possible even with current hardware, software, and development tools. The case-based approach offers two very promising possibilities automated probing in the future — integrating with a model-based approach, and adding capabilities. We have argued that, while the model-based approach offers the promise of seamless and instantaneous design-diagnosis integration, the method applications because of approach is to combine its is not suitable for real-time extensive computational requirements. One very promising the model-based and case-based approaches in a hybrid system that captures the automatic reasoning capabilities of diagnostic models with the quick response of a case base that stores only the most important and relevant defects. The model-based approach can complement the case-based system new test is added to the case base, we can in two ways: (i) whenever a new use the circuit model to verify the correctness of the partial signature (specified by the analyzer or obtained by recording the test which led to the new defect) and even augment base with model-based predictions of test defect or it (i.e., replace outcomes); and, -27- (ii) unknown values when outcomes in the the case-based case approach terminates with an empty candidate defect new hypothesis regarding the defect, set, the circuit and propose appropriate model can generate tests to isolate it. We can also apply the model-based system off-line to initialize the case base. Note that these schemes apply model only the circuit to resolve exceptions; hence, they do not degrade the average response time of the system during regular diagnosis. The hybrid approach, therefore, offers the advantage of a robust case-based system that permits user interaction and provides rapid on-line response, with the enhanced model-based capabilities quickly to changes in product design (using model-based reasoning). of the diagnosis function into development of its the case-based to react This decomposition short-term and long-term components permits independent and model-based subsystems; for example, we might first develop the case-based system using currendy available hardware and software, preparing the infrastructure for a future implementation of the model-based improves state-of-the-art to The second promising to the CDSS. As testing will the the approach practicable. direction concerns the addition of automated probing capabilities printed circuit assemblies and probing signal generators, make component when become become smaller and more dense, automatic a necessity. Most test equipment, such as power supplies, and frequency analyzers, are already controllable by a computer; however, currently most diagnosis operations use manual probing. With the emergence of new robotic technologies with precise and swift movement capabilities (with vision feedback control), automated flexible probing becomes feasible A case-based system controlling the automatic circuit in manufacturing and settings. probing and measurement equipment can reduce diagnosis time by performing routine and repetitive diagnosis functions without human intervention; analyzers would still be required to troubleshoot new defects and resolve inconclusive diagnoses. In conclusion, this paper has propxssed a viable approach to reduce diagnosis time and effort, and provide rapid feedback for quality improvement. based approach is its ability to A novel feature of our case- incorporate principled test selection methods. Although developed the dynamic programming formulation and heuristic we test selection rules in the context of off-line diagnosis, the underlying principles also apply to in-line (in-circuit and functional) testing. Computer assisted diagnosis repair strategy. is, of course, just one element of the overall testing and We have alluded to the many other decisions and tradeoffs to consider in formulating an effective testing strategy. These decisions include: • how to incorporate diagnosis and testability issues in the product design process? Williams and Parker [1983] present a comprehensive survey of issues and methods relating to design for testability; • where to locate in-line test stations, and what -28 tests to perform at each station? Pappu and Graves [1992] discuss this issue in a generic production line setting, and Chevalier [1992] considers alternative in-line testing strategies for printed circuit board assembly operations; • whether to perform detailed, diagnostic As we noted times and in Section 2, this tests in-line or off-line? decision depends on the tradeoff between higher cycle capacity requirements on the main assembly line versus greater off-line test effort; • how to "discretize" the For instance, the test continous outcomes of tests? measurements in analog circuits are continuous values; the engineer must specify ranges of values that correspond the test. When proposes a method to set the outcomes for to, test ' "PASS" or "FAIL say, measurements are error-prone. Chevalier [1992] the ranges based on the relative costs and probabilities of false acceptance and rejection; • what tests to include in the test set for off-line As we discussed in Section 4 the choice of joint versus focused expected number of iterations • when diagnosis? to implicitly assumed the board can be repaired. For (including the cost of work impacts the complete the diagnosis; to stop the testing process for a Our discussion tests board? that the defect some identified so that defects, the cost to diagnose and repair the board might exceed the value of the board. in process, etc.) boards contain multiple defects, a must be conclusively common policy is to discard any board that When goes through the diagnosis-repair loop more than a cenain prespecified number of times. Conversely, analyzers might sometimes terminate the diagnosis process prematurely (before conclusively proving a defect), and recommend repairing the most likely cause of the problem; and, • how Ou to screen boards that enter diagnosis? and Wein [1992] and Longtin et al. [1992] discuss screening strategies to discard semiconductor wafers with low yield in order to avoid wasteful usage of the subsequent detailed wafer-probing capacity. Similar screening strategies might alleviate diagnosis bottlenecks in printed circuit board assembly. As these issues illustrate, testing and diagnosis in printed circuit board assembly operations many important and challenging research as we have defined it, focusses on finding continues to offer opportunities. Finally, circuit board diagnosis, the source of the that the its board can be repaired. For process improvement, root cause, i.e., the we need problem so to trace the component vendor, board design, or processing problem to step responsible for introducing the defect; proactive process control requires a further step, namely, anticipating and preventing defects. Given massive volumes of data generated by the the placement, reflow, inspection, testing, diagnosis, and repair stations in today's high volume electronics assembly environment, we require automated systems that 29 systematically monitor and screen the signals, recognize patterns, identify the root causes, and initiate process control and improvement activities. Acknowledgments We are very grateful to Brian Santoro, Thomas : Babin, and Dennis Miller for initiating and providing constant support and encouragement. We thank the analyzers, especially Zakiuddin Mashood and Karl Browder, and members of the DOC team for sharing their insights on circuit diagnosis. project 30- this Table 1 : Complete Case base for 5-component, serial network Figure 1 1: Cumulative probabiiity of various defects 00% T CO CM Defect Number -T (in in to deer. prob. order) r~- CO Figure 2: Flow Chart for Case-based Initialize Case Base CDSS to include most common anticipated defects and their signatures I Initialize Candidate Defect Set (CDS) New Available Test Set (ATS) to Board be diagnosed I Select Next Test from ATS I Apply Next Test and observe outcome i. Update CDS: eliminate all defects whose expected outcome does not match observed outcome I Update ATS: remove latest test + all tests whose outcomes are the same for all remaining defects I No CDS contains > 1 defect? D ( ATS empty? Yes I f Add new tests j Add new case (defect) + any additional tests Perform defect verification remaining (apply tests) signature augmentation Figure 3: Input terminal Comp 2 J 1 Probe point for Defect Test i i Test 1 implies Proh>e point for Test 2 component i is Series network example » Comp 3 —j ^ Comp probe point for Probe point Test 3 defective, for i = for Test 4 1,2,. ..,5 measures the voltage between components i and (i+1), for i = 1,2,3, and 4. Figure 4: r Sample screens from CDSS prototype n 1— Figure 5: Cumulative coverage and number of cases added to CDSS 3 Week References Bennetts, R.G., 1981. Introduction to Digital Board Testing Crane, Russak . & Co. and G. J. Lieberman, 1984. "Inspection Policies for Fault Location". Operations Research, Vol. 32, No. 3, pp. 566-574. Butler, D. A., Chevalier, P., 1992. "Optimal Inspection for Circuit Board Assembly", Part II of Ph.D. thesis, Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts. Davis, R., 1984. "Diagnostic Reasoning Based on Structure and Behavior," Anificial Intelligence Vol. 24, pp. 347-410. . Davis, R., and W. C. Hamscher, 1988. "Model-Based Reasoning: Troubleshooting," A.I. Memo No. 1059, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts. J., and B.C. Williams, 1987. "Diagnosing Multiple Faults," Artificial Intelligence Vol. 32, pp.97- 130. de Kleer, . Frank, W., and B. Sauve, 1989. "Expert Systems for Manufacturing and Testing Applications," Electrical Communication Volume 63, Number 2. . Garcia-Rubio, M., 1988. "Key Problems and Solutions in Electronics Testing," Productivity and Ouality Improvements in Electronics Assembly J. A. Edosomwan and A. Ballakur (eds.), McGraw Hill Book Company. . Gensereth, M.R., 1984. "The Use of Design Descriptions in Automated Diagnosis, Artificial Intelligence Vol. 24, pp. 41 1-436. . W.C, 1988. "Model-Based Troubleshooting of Digital Systems," A.I. Technical Report No. 1074, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts. Hamscher, Harmon, P., and D. King, 1985. Artificial Wiley and Sons, Inc., New York. Intelligence in Business: Expert Systems John . Henneman, and Rouse, 1984. "Measures of human problem solving performance in fault diagnosis tasks," IEEE Transactions on Systems. Man. Cybernetics SMC 14, pp. 99. 112. — Dynamic and H. Tokumaru, 1990. "A framework for case-based diagnosis Frame and Reasoning on the Representation," Transactions of the Societv of Instrument Ishida, Y., and Control Engineers Japan. . Kritz, J., and H. Sugaya, 1986. "Knowledge-Based Testing and Diagnosis of Analog FTCS Digest of Papers 16th Annual International Symposium on Circuit Boards," . Fault-Tolerant Computing Systems, pp. 378-383. Lee, C.N., and P. Liu, 1988. "Case-based Reasoning for Robotic Assembly Cell Diagnosis," Proceedings of the SPIE The International Societv for Optical 241-246. Engineering SPIE Vol.1008, — . Rl- Longtin, M., L. M. Wein, and R. Welsch, 1992. "Sequential Screening in Semiconductor Manufacturing, II: Exploiting Spatial Dependence", Working paper, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts. Maunder, CM., and R.E. Tulloss, 1990. The Test Access Port and Boundary-Scan Architecture IEEE Computer Society Press Tutorial. . Moore, J. C, W. B. Richmond, and A. B. Whinston, 1988. "A Decision Theoretic Approach to File Search", Computer Science in Economics and Management. Vol. I pp. , 3-19. J. C, and A. B. Whinston, 1986. "A Model of Decision-making with Sequential Information- Acquisition", Decision Support Systems, Vol. 2, pp. 285-307. Moore, J. A., S. R. Loney, and B. A. Binney, 1990. "Diagnostic-Strategy Selection for Series Systems ', IEEE Transactions on Reliability, Vol. 39, No. 3, pp. 273-279. Nachlas, Newstead, M.A., and R. Pettipher, 1986. "Knowledge Acquisition for Expert Systems," Electrical Communication Volume 60. . pp. 115-121. Noble, P.J.W., 1989. Printed Circuit Board Assembly Open University Press (Robotics . Series) J., and L. M. Wein, 1992. "Sequential Screening in Semiconductor Manufacturing, Exploiting Lx)t-to-lot Variability", Working paper, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts. Ou, I: Pappu, S., and S. C. Graves, 1992. "A Dual-ascent Algorithm for Determining the Optimal Testing Strategy", unpublished manuscript, Massachusetts Institute of Technology, Cambridge, Massachusetts. Preist, C, and Circuits," B. Welham, 1990. "Modelling Bridge Faults for Diagnosis in Electronic Working Notes of First International Workshop on Principles of Diagnosis (unpublished), Stanford University, Stanford, California. Reiter, R., 1987. "A Theory of Diagnosis from First Principles," Artificial Intelligence . Vol. 32, pp. 57-95. Rissland, E.L., and D.B. Skalak, 1988. "Case-based reasoning in a rule-governed domain," Proceedings - Fifth Conference on Artificial Intelligence Applications. IEEE. Rouse, W.B., and N.M. Morris, 1985. "Review and Evaluation of Empirical Research Troubleshooting," Journal of t he Human Factors Societv Vol. 27, No. 5. in . Rouse, W.B. and R.M. Hunt, 1986. "Human Problem Solving in Fault Diagnosis Tasks." Research Note 86-33, U.S.Army Research Institute for the Behavioral and Social Sciences. Semmelbauer, T., 1992. "A Case-based Approach for Diagnosing Failed Printed Circuit Boards in Manufacturing", Masters' thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts. Soederberg, E. M., 1992. "Using Defect Data to Characterize Assembly Process Capabilities and Constraints for Printed Circuit thesis, Board Design for Assembly", Masters' Massachusetts Institute of Technology, Cambridge, Massachusetts. -R2- Tong, D.W., E. Walther, and K.C. Zalondek, 1989. "Diagnosing an Analog Feedback System Using Model-Based Reasoning," Artificial Intelligent Systems in Government. Proceedings of the IEEE. Memo Trice, A., and R. Davis, 1989. "Consensus Knowledge Acquisition," A.I. No. 1 183, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts. Wenstrup, D. J., 1991. "Material Identification and Tracking in Manufacturing", Masters' thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts. Williams, T. W., and K. P. Parker, 1983. "Design for Testability-A Survey". Proceedings of the IEEE, Vol. 7 1 No. 1 pp. 98- 1 1 2. , , Model Based Reasoning to Diagnosis of Faults in Navigation Equipment," Master's Thesis, Air Force Institute of Technology, Wright-Patterson Air Force Base, Ohio. Yost, R.E., 1989. "Application of Intertial -R3- / {J u J o Date Due AU6. t 6 1993 MIT LIBRARIES ' mil nil nil 3 ' 111' II loao ^077^b^3 3