From: AAAI Technical Report SS-94-04. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. System Reliability and Risk Assessment: A Quantitative Extension of IDEF Methodologies AndrewKusiak, Intelligent Systems Laboratory, Departmentof Industrial Engineering Nick Larson, Intelligent SystemsLaboratory, Deparlmentof Industrial Engineering The University of Iowa, Iowa City, Iowa 52242-1527 1. Introduction Evaluating system reliability requires modeling the interaction of resources, information, and material within the system. Such a model must consider quantitative data describing the reliability of each element of the system, as well as logical data describing the relationship between individual components. For example, a manufacturing system may assemble products X, Y, and Z, on machines M1, M2, and M3, respectively, and package the products on a fourth machine, M4. Therefore, three different relationships exist between the componentsof the system, one for each product. If the reliability of each machine differs, the systemreliability will differ dependingon the product. Similarly, the reliability of the systemas a whole will be affected by the production levels of products X, Y, and Z. Given the exampleabove, with only three products and four machines, it becomes apparent that determining systemreliability requires a significant amountof data and a structured modeling methodology. Furthermore, to obtain an accurate assessment of system reliability, it is necessary to include additional data describing information and material componentsof the system; such as inspection procedures, assembly specifications, parts, and subassemblies. Systemreliability describes the likelihood of success or failure in the operation of a system. Risk assessmentis a technique for identifying scenarios that lead to problems in a system, determining the likelihood of each scenario, and the evaluating the consequenceof each scenario, i.e., the problem. Although quantitative approaches to risk assessmentemploythe principles of systemreliability, the identification, quantification, and evaluation of risk is a more comprehensive modeling activity. Therefore, risk assessmentprojects are well suited for the tools developed for process modelingand analysis. In 1978, the United States Air Force selected SADT (Structured Analysis and DesignTechnique) as the language to support the Integrated ComputerAided Manufacturing (ICAM) program. SADTactivity modeling was adopted by the ICAMprogram and revised by Sofrech, Inc. to develop the ICAMDefinition Methodology (IDEF0). Ross (1985) states that "thousands of people from hundreds of organizations working on more than one hundred major projects" proceeded to use the methodology for system definition and design, as well as project management. IDEF0introduced manufacturing to techniques that were developed for computer system and software engineering applications. Additional IDEFtechniques were developed 88 for information analysis (IDEF1), dynamic analysis (IDEF2), and process modeling (IDEF3). This paper presents procedures for integrating system reliability and risk assessment techniques with IDEF0and IDEF~modeling. The paper is motivated by the need to increase the value of IDEF0 and IDEF3 models by incorporating quantitative data, thus, extending modeluse to applications such as risk assessment. By extending the power of existing IDEF models, process modeling will become more attractive to managementand evolve as a powerful tool for reengineering design and manufacturing systems. 2. Definitions of System Reliability and Risk Assessment Reliability maybe defined as the ability of an item (product, system, etc.) to operate underdesignated operating conditions for a designated period of time or number of cycles (Modarres, 1993). An item’s reliability is often measuredby the probability it will perform without failure given a set of conditions. The following expression is a probabilistic representationfor reliability. R(t) = P(T > tlc~, c2, ...) (1) In (1), t is the period of time for the item’s operation, T is the time to failure of the item, R(t) is the reliability of the item, and Cl,C2 .... are the conditions under whichthe item is operating. Thevariable t is often referred to as the mission time. In practice, T is a random variable representing the time-to-failure of the item, and Cl,C2 .... are implicitly considered. Furthermore, fit) mayrepresent the probability density function of the randomvariable T. The probability that the item fails prior to time t is defined in (2). P(T < t) = I~ f( O)dO = F(t), for t>0 (2) Since F(t) denotes the probability the item will fail prior to time t, it is formally the unreliability of the item. Thus, the reliability of the item is determinedby (3). R(t) = 1- F(t)= It’f(O)dO (3) A systemis a collection of entities (i.e., information and material) and resources (i.e., machines and workers) whichinteract to performa set of activities in a given process. Successfulcompletionof the process is dependent uponpropercompletionof the individual activities in the process. Therefore, it is necessary to model the relationshipbetweenvariousitems(entities andresources), as well as the reliability of individualitemsto assess the reliability of the system. Complexmanufacturingsystems producemanydifferent products through manydifferent sequences of manufacturing activities. The system reliability is a function of the activities performedand, thus, is product dependent.Therefore, if the production volumeof a product exhibiting low systemreliability is increased,the reliability of the entire manufacturing system will decrease. At this point, the necessity for evaluating system reliability in a manufacturingsetting becomes obvious. Manyof the common techniques for modelingsystem reliability are difficult to applyto complexmanufacturing systems with multiple product types. Therefore, the principles of such tools are moreuseful whenapplied to modelingschemesdevelopedfor manufacturingsystems, such as IDEF0and IDEF3.Also, the task of evaluating system reliability in a manufacturingsetting is more attractive if performed as a component of a risk assessment study. Risk is a measureof the probability and severity of adverse effects (Lowrance,1976). Several types of risk associated with project planningand softwaredevelopment havebeencited in the literature, see Angand Gay(1993) and Chittister and Haimes (1993). The following manufacturingrisks are generalized from those cited in variousengineeringdisciplines. 1. Requirementsrisk. Theconceptof whatthe productis intendedto accomplish is not accurate. 2. Technicalrisk. Theproductdoes not adhereto the requirements set forth by its design. 3. Schedulerisk. Theproductwill not be completed by the deadlineset forth by productionplanning. 4. Costrisk. Theproductioncost will overrunits budget. 5. Networkrisk. The mechanism for linking various productionactivities will not performas intended. Risk assessmentis a process that attempts to answer three questions: (1) Whatcan go wrong?(2) Whatis likelihood that it will go wrong?(3) Whatare the consequences?(Kaplanand Garrick, 1981). Basedon these questions,(4) is a quantitativedefinitionof risk, whereSi is a scenario of eventsthat leads to a problem,Pi is the likelihood of scenario i, and Ci is the consequenceof scenarioi. R = {Si, Pi, C~} i =1, 2 ..... n (4) This section has provided definitions of system reliability and risk assessment. Section 4 discusses 89 techniques for determining system reliability and integrating risk assessmentand IDEFmodels. Section 5 discusses issues related to risk assessment, such as developingquantitative risk modelsbased on IDEF0and IDEF3. 3. Fundamentals of IDEF0 and IDEF3 IDEF0was developedfor modelinga wide variety of systems which use hardware, software, and people to perform activities (U. S. Air Force, 1981). An IDEF0 modelconsists of three components,diagrams,text, and a glossary, all cross-referencedto each other. Thebox and arrow diagramsare the major components of the model. In a diagram, a box represents a function and an arrow represents an interface. Abox is assignedan active verb phrase to represent the function. Aninterface maybe an input, an output, a control, or a mechanism,and is assigneda descriptivenounphrase. Inputs (I) enter the box fromthe left, are transformedby the function, andexit the boxto the right as an output(O). Acontrol (C) enters top of the box and influences or determinesthe function performed. A mechanism (M) is a tool or resource which performsthe function. Theinterfaces are generallyreferred to as the ICOMs (see Figure1). Control (C) Input(I) Output (0) FUNCTION Mechansim(M) Figure1. IDEF0function box and interface arrows Eachdiagramhas betweenthree andsix function boxes placed on a diagonal. Theboxeseach havea specific node numberandare connectedby all relevant interfaces. Each box on the diagrammaybe decomposed into a lower level of detail. This feature restricts the amountof information that maybe containedin the modelon a single level. The resulting diagramsforma hierarchy of informationwhich is summarized in a nodetree. IDEF0provides a structured representation of the functions,information,andobjects whichare interrelated in a manufacturingsystem. IDEF3wascreated specifically to model the sequence of activities performed in a manufacturingsystem. AnIDEF3modelenables an expert Reject proposal Evaluate Negotiate Award contract Accept Figure 2. IDEF3process flow diagram(Mayeret al. 1992) to communicatethe process flow of a system through defining a sequenceof activities and the relationships betweenthose activities. There are two basic components of the IDEF3process description language, the process flow description and the object state transition network description. The two componentsare cross-referenced to build IDEF3diagrams(Mayeret al., 1992). The IDEF3process flow description is madeup of units of behavior (UOBs),links, and junction boxes. UOBrepresents a function or activity occurring in the process. For example,assembleparts, performinspection, or evaluate proposal are all activities which maybe represented as UOBsin a process model. Relationships between UOBsare modeledwith three types of links, precedencelinks, relational links, and object flowlinks. Precedence links express simple temporal precedence betweenUOBs.Relational links highlight the existence of a relationship between two or more UOBs,however, no temporalconsWaint is implied. Objectflowlinks providea mechanism for capturing object related constraints between UOBsand carry the same temporal semantics as a precedencelink. Thelogic of branchingwithina processis modeledusing junctions. Several classifications are used to def’mejtmctionboxes.Junctionsare classified according to logical semanticsas and(&), or (O), and exclusive or (X). Multipleprocesspaths are classified as fan-in or fanout corresponding to converging and diverging paths, respectively. The relative timing of process paths that converge or diverge at a junction are classified as synchronous or asynchronous. An example of an IDEF3 process flow diagramis shownin Figure 2 (Mayeret al., 1992). 4. Integrating System Reliability Techniques and IDEF Models Asstated in section 2, systemreliability tools maybe quite useful whenapplied to modelingschemesdeveloped for manufacturingsystems, such as IDEF0and IDEF3.In this section, several systemreliability modeling techniques 90 are integrated with IDEF0and IDEF3.For a detailed discussionof each technique,see Modarres (1993). 4.1 Reliability Block Diagrams Reliability block diagrams model the effect of component failure on systemperformanceby capturing the physical arrangement of the system. Typical system configurations include series systems, parallel systems, standby redundantsystems, shared load systems, complex parallel-series systems, and complexnonparallel-series systems (Modarres, 1993). Additional system configurationsmaybe identified in various applications, however, most manufacturingsystems maybe accurately described using those listed above. Figure 3 shows reliability blockdiagramsfor series andparallel systems and Figure 4 illustrates complexsystems. Thereliability of complexsystems maybe calculated using various analytical methods, however, such methods become computationally intensive as the numberof components increases (Shooman,1990). Thesystemconfigurationsillustrated in Figures3 and 4 mayalso be used to describe IDEF0and IDEF3models. Angand Gay(1993) discuss extensions to IDEF0models whichenable project risk assessment. Several project situations are described whichmaybe generalized to the systemconfigurationsdescribedabove. Theextensionsare efficient for includingquantitativedata in an IDEF0 model, such as a probability of occurrence. However,due to the decomposition principle of IDEF0,it is difficult to identify complex system configurations in the model. The concepts of reliability block diagrams are more easily adapted to IDEF3models. Consider Figure 2; the five activities in the modelare arrangedin a complexparallelseries configuration. UnlikeIDEF0models,the numberof activities (i.e., functions)in IDEF3 modelsis not restricted to six per level. AlthoughIDEF3allows for elaboration on a particular UOB (i.e., activity), the entire processflow maybe constructedon a single level. This representation is moresuitable for IDEFapplications of risk assessment whichare based on the principles of reliability diagrams. l ll block + (a) (a) Co) Figure3. Series(a) andparallel (b) systemconfigurations In IDEF3,series systems(or the series components of a complexsystem) are identified by UOBsconnected by precedenceand/or object flow links. A series of UOBs may not contain a junction of any type, however,a junction box maybegin or terminate a series. Furthermore,a UOB is independentif it is not connectedto another UOB by a relational link. Thereliability of a series of independent UOBs is definedby (5), whereRs(t) is the reliability of the system, the system contains N UOBs,and Ri(t) is the reliability of the ith UOB. N R,(t) = Rl(t) × R:(t)X...×RN(t) = H Ri(t) (5) i=l Parallel systemsare defined in IDEF3using junction boxes. Each UOBimmediately following an and (&) junction box will be performedin parallel. Therefore,the reliability of the parallel systemfollowingan & junction box is determinedby (5) as well. However,only one UOB immediatelyfollowingan exclusive or (X) junction box performed.Thus, the systemreliability for parallel UOBs followingan Xjunction box is defined by (6), wherethe systemcontains NUOBs, Ri(t) is the reliability of the ith UOB,and Pi is the probability of occurrenceof the ith UOB(PI+P2+...+PN= 1). Co) Figure4. Complex parallel-series (a) and nonparallel-series(b) systems Thesystemreliability for parallel UOBs followingan or (O) junctionbox is moredifficult to determine.At least one, and as manyas all, of the UOBsfollowing an O junction box maybe executed. Unlike the & and X cases, the numberof different UOBsperformedin parallel is unknown.However,if the system contains N UOBs,RiO) is the reliability of the ith UOB, andPi is the probability of occurrenceof the ith UOB(Pi < 1), a lower boundon the systemreliability is given by (7). This describesthe worst case, whereall UOBsare executed and subject to failure. If the reliability is to be evaluatedfor a known set of MUOBsin the parallel system (M < N), (5) may used. Figure 5 illustrates parallel systemUOBs for &, X, and 0 junction boxes in IDEb3. R,(t) = (PI x Rl(t)) X (P: R2(t))x...x(Pn x Rn N Rift) = (PI x Rl(t)) (P: x Rz(t))+...+(Pn x R~ = H ( P, x R,(t)) = E(P~ X Ri(t)) (7) i=1 N (6) Twoor moreUOBsconnected by relational links may be considered a single UOBwhencalculating system reliability in IDEF3models. In Figure 2, UOBs3 and 4 i=l 91 are connectedby a relational link, implyingan interaction with UOB4 if UOB3 is executed. In this example, successfullyexecutingUOB 3 will also require successful completion of UOB4 (i.e., R3(t) = R3(t) x R4(t) However,UOB 4 does not require executionof UOB3 and, therefore, is not dependentuponits success.Thedirection of the relational link determinesthe reliability of the connected UOBs.In general, if UOBi is connected to UOB (i + 1) with a relational link in the direction of UOB (i + 1), the successor failure of UOB i will determine success or failure of UOB (i + 1). In IDEF3applications of system reliability modeling,relational links maybe avoidedby combining related activities into a single (JOB. models. A path set is a set of units (i.e., activities, functions, UOBs) that form a connectionbetweeninput and output whentraversed in the direction of the arrows (Modarres, 1993). For each path through an IDEF3model, a minimalpath set will exist containing only the UOBs on the path. Onceagain, consider Figure 2. Thereare three minimalpath sets in the IDEF3model;P1= (1, 2), P2ffi (1, 3, 5), andP3 = (1, 4, 5). Eachpath set represents event that wouldsuccessfully accomplishthe objective of the system if each of the UOBson the path execute successfully. Therefore, the union of all m path sets definesthe set of all successfulcompletions of the system. Theprobability of this unionrepresents the reliability of the system,as shownin (8). R~(t) Pr ob(Pl u P2U...UPm) Unfortunately,(8) requires that the path sets (Pi) disjoint and, in practice, this is seldomtrue. Anupper bound on the system reliability maybe determined by assuming that the path sets are disjoint, as in (9). However,for reliability values greater than 0.9 for the missiontime, as in mostpractical applications, (9) does not yield a useful bound(Modarres,1993). (a) Rs(t) < Prob(P0 + Prob(Pz)+...+Prob(Pm) Acut set is a set of units (i.e., activities, functions, UOBs) that interrupt all possible connectionsbetweenthe input and output points in the diagram(Modarres,1993). In IDEF3process flow models,the minimalcut set is the smallest set of UOBswhich prevent flow from input to output. Failure of all UOBsin the minimumcut set results in systemfailure. Theminimum cut sets in Figure 2 are C1= (1), C2= (2, 3, 4), andC3 ffi (5). If the model has n minimalcut sets andCi represents the event that all UOBs in the cut set fall prior to the missiontime t, the system reliability is obtained from (10). Since the probabilitythat all the UOBs fall in at least oneof the cut sets is the probabilitythat the systemfails, this valueis subtractedfrom1 to obtainthe reliability of the system. 0,) R,(t) = 1 - Prob(C,u C2w...uC~) (10) Asin (8), the unionin (10) is not usually disjoint, thus, (11) gives the lowerboundfor systemreliability. Sincethe probabilityof failure is usedto evaluatethe cut sets and, in practice, these values are muchlower than reliabilities, the lower bound in (11) is a better representationof systemreliability. (c) Figure 5. Parallel systemsmodeledusing IDEF3notation for and(a), exclusiveor Co),andor (c) junctionboxes 4.2 Path Set and Cut Set Methods Path set and cut set methods were developed to determinethe reliability of complexsystemsdescribedby reliability block diagrams. However,the principles of these methodsare very useful in the analysis of IDEF3 92 R,(t) > 1 - [Prob(C0+ Prob(C2)+... +Prob(Cn)] (11) Theprinciples of path sets maybe adaptedto evaluate decision making processes modeledwith IDEF3.Each minimalpath set identified in an IDEF3modelcorresponds to a set of decisions in the operation of the system. Junction boxes identify points where decisions are made within a system. The decision set corresponding to path set P3 = (1, 4, 5) is 3 ={do not re ject pr oposal, ac cept proposal}. Twodecisions are made in the process; one corresponding to each diverging (or fan ou0 junction box. The first decision is to "not reject the proposal" and the second is to "accept the proposal." Therefore, the reliability of the decision set maybe easily determinedby calculating the probability of successfully completingall UOBsin the corresponding path set. Cut sets maybe used to expose critical activities in the system. UOBsin the intersection of all or manyof the cut sets maybe considered critical to the operation of the system. Furthermore, a cut set with a high probability of failure (i.e., there are few UOBs in the cut set and each has a high probability of failure) maybe considered a critical group of UOBs. Applying system reliability techniques to IDEF3 modelsrequires incorporating quantitative data regarding probability of occurrence of UOBsand probability of failure for each UOB.Reliability values for each of the ICOMs in IDEF0 models may be used to obtain probability of failure data for the correspondingUOBof an IDEF3model. Diverging (or fan-ou0 junction boxes in IDEF3mayalso contain probability of occurrence values for each UOBthat immediatelyfollows the junction box. It must be realized that the values for Pi and Ci are based on approximations and assumptions and, thus, possess a high level of uncertainty. Formal methods for treating uncertainty analysis are discussed in the literature (Morganand Henrion, 1990). Uponcalculating the expected risk in the system, two courses of action may be taken; (1) explore alternative managementdecisions to avoid risk (i.e., decrease the likelihood of high risk scenarios), and (2) reengineer processes to mitigate the consequences. IDEF0 and IDEF3 were developed to provide a mechanism for evaluating the performance of complex manufacturing systems. System reliability and risk assessment are applications which provide a useful opportunity for extending the power of these methodologies. Acknowledgment This research has been partially supported by grant No. DAAE07-93-C-R080 from the U.S. Army Tank Automotive Command. References Ang, C. L. and R. g. L. Gay (1993). "IDEF0 modeling for project risk assessment," Computersin Industry, 22, pp. 31-45. Chittister, C. and Y. Y. Haimes(1993). "Risk associated with software development: A holistic frameworkfor assessment and management,"IEEE Transactions on Systems, Man, and Cybernetics, 23(3), pp. 710-723. Kaplan, S. and B. J. Garrick (1981). "Onthe quantitative definition of risk," Risk Analysis, 1(1), 1981. Lowrance, W. W. (1976). Of Acceptable Risk, William Kaufmann, Los Altos, CA. Mayer, R. J., T. P. Cullinane, P. S. deWitte, W. B. Knappenberger,B. Perakath and M. S. Wells (1992). Information Integration for ConcurrentEngineering (lICE) IDEF3Process Description Capture Method Report, ArmstrongLaboratory, Wright-Patterson AFB, Ohio 45433, AL-TR-1992-0057. Modarres, M. (1993). What Every Engineer Should Know About Reliability and Risk Analysis, Marcel Dekker, Inc., NewYork, NY. Morgan, M. G. and M. Hem’ion (1990). Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, CambridgePress, Cambridge, U.K. Ross, D. T. (1985). "Applications and extensions SADT," Computer, April, pp. 25-34. Shooman,M. L. (1990). Probabilistic Reliability: An Engineering Approach,2nd Ed., Kreiger, Melbourne, FL. U. S. Air Force (1981). Integrated ComputerAided Manufacturing (ICAM)Architecture Part I1, Volume W-Functional Modeling Manual(IDEFO), Air Force Materials Laboratory, Wright-Patterson AFB,Ohio 45433, AFWAL-tr-81-4023. 5. Risk Assessment in IDEF Models Evaluating risk in IDEFmodels requires identifying scenarios, determining the likelihood of these scenarios, and estimating the consequences.This set of objectives is often called the "risk triplet." The first two componentsof the risk triplet are related to the techniquesfor determining system reliability discussed in Section 4. Path sets maybe used to identify scenarios in the system. Each scenario is qualitatively evaluated to identify possible problemsthat mayresult. The likelihood of each scenario is determined by the probability of occurrence of the UOBsin the path set. Determiningthe consequencesof the scenario requires estimating the impact on a set of performance measures. Typical performance measures in manufacturing systems are in-process inventory levels, lead times, set-up times, scrap, rework, and resource utilization. Whenpossible, consequences related to different performance measures should be converted to a single unit of measurement,such as dollar loss. The total expected risk in the systemis calculated by (12), wherePi is the probability of scenario i, and Ci is the consequence,or cost, of the scenario. n R=~(P, xC,) (12) iffil 93