Condition Based Maintenance of Critical Machinery Assets: An Intelligent Architecture Dr. George Vachtsevanos Georgia Institute of Technology School of Electrical and Computer Engineering Atlanta GA 30332-0250 (404) 894-6252 Voice (404) 894-7583 Fax gjv@ece.gatech.edu http://icsl.marc.gatech.edu Presented at the Workshop on Automated Machinery Maintenance The University of Texas at Arlington July 17, 2003 Topical Outline 1. 2. 3. 4. 5. Introduction – What is Condition Based Maintenance? Elements of a CBM Architecture Example Demonstration An Intelligent Agent Based CBM Paradigm Future R&D Directions/Concluding Comments. Condition-Based Maintenance The Opportunity Condition Based Maintenance (CBM) promises to deliver improved maintainability and operational availability of naval systems while reducing life-cycle costs The Challenge Prognostics is the Achilles heel of CBM systems - predicting the time to failure of critical machines requires new and innovative methodologies that will effectively integrate diagnostic results with maintenance scheduling practices Condition Based Maintenance Objective – Determine the “optimum” time to perform maintenance Problem Definition – A scheduling problem – schedule maintenance timing to meet specified objective criteria under certain constraints Condition Based Maintenance Major Objective – Extend system life cycle as much as possible without endangering its integrity Enabling Technologies – Various Optimization Tools – Genetic Algorithms – Evolutionary Computing A Maintenance Management Architecture Time-Directed Tasks • Trend Data • Logs Maintenance Schedule Real-time Diagnostics / Prognostics and Trend Analysis • Technical Doc Ref • Preplanned Work Corrective Tasks • Emergent Work Case Library Other Process Management Component (ERP) Work Order Backlog • Material Required • Labor Required • Work Procedures • Actions Taken • Conditions Found • Cost Collector Enabling Technologies Genetic Algorithms for Optimum Maintenance Scheduling Case-Based Reasoning and Induction Cost-Benefit Analysis Studies CBM Performance Assessment Objective: – To assess the technical and economic feasibility of various prognostic algorithms Technical Measures: – Accuracy, Speed, Complexity, Scalability Overall Performance Measure: – w1Accuracy + w2Complexity + w3Speed + … (wi - weighting factors) PM1 PM2 PM3 Performance Algorithm #1 * * * Assessment Matrix: Algorithm #2 * * * Algorithm #3 * * * Prognostics Objective – Determine time window over which maintenance must be performed without compromising the system’s operational integrity Prognostics Enabling Technologies – Multi-Step Adaptive Kalman Filtering – Auto-Regressive Moving Average Models – Stochastic Auto-Regressive Integrated Moving Average Models (ARIMA) – Forecasting by Pattern and Cluster Search – Variants Analysis – Parameter Estimation Methods – Others Prognostics Enabling Technologies (cont’d) – AI Techniques Case-Based Reasoning Intelligent Decision-Based Models Min-Max Graphs – Petri Nets – Soft Computing Tools Neural Nets Fuzzy Systems Neuro-Fuzzy PEDS Software System Architecture (Stand-alone) Hardware •Plant •Sensors •DAQ CBM Causal factors Scenario Generator main. schedule FAHP Causal Adjustments 1. GUI 2. Data Preprocessing 3. Mode Estimator / Usage Pattern Identification Event Dispatch Central DB failure dimension Database Management 5. Feature 4. Feature Extraction Extraction 5. Classifier (Fuzzy) 6. Virtual Sensor (WNN) DWNN Classifier (WNN) Case-based Diagnostic Reasoner CPNN The Navy Centrifugal Chiller Chiller Failure Modes •Condenser Tube Fouling •Condenser Water Control Valve Failure •Tube Leakage •Decreased Sea Water Flow •SW in/out temp. •SW flow •Cond. press. •Cond. PD press. •Cond. liquid out temp. Compressor Pre-rotation Vane •Compressor Stall & Surge •Shaft Seal Leakage •Oil Level High/Low •Aux. Pump Fail •Oil Cooler Fail •PRV/VGD Mechanical Failure •Comp. suct. press./temp. •Comp. disch. press./temp. •Comp. oil press./flow (at required points) •Comp. bearing oil temp •Comp. suct. super-heat •Shaft seal interface temp. •PRV Position Condenser Evaporator •Target Flow Meter Failure •Decreased Chilled Water Flow •Evaporator Tube Freezing •CW in/out temp./flow •Eva. temp./press. •Eva. PD press. •Liquid line temp. •(Refrigerant weight) •Non Condensable Gas in Refrigerant •Contaminated Refrigerant •Refrigerant Charge High •Refrigerant Charge Low Timing sequence(1)-No fault Detected Start next cycle Data Collection Mode Identification Feature Extraction (Extract features For Diagnostics only) FuzzyDS WNN No Fault Detected Prognostic routines will not run t0 t1 t2 t 1. The timing sequence is managed by the Task Manager 2. Algorithm modules are started by FEATURE READY events 3. Each diagnostic module decides upon the presence or absence of a fault 4. The diagnostic modules report their conclusion to the database. 5. Each diagnostic module runs its routine and responds back to the task manager. 6. Task manager receives the events and decides which module or algorithm should be started. 7. The diagnostic decision (or No fault) is displayed on the GUI;GUI receives result from database. 8. All prognostic routines are initiated when a fault has been detected. Timing sequence(2)—Fault Confirmed Start confirmation Continue confirmation Fault confirmed Start prognosis Data Collection Mode Identification Collect data for prognosis Feature Extraction Extracts features for prognosis FuzzyDS WNN DWNN CPNN Fault Detected By FuzzyDS or WNN t0 Fault confirmed t1 t2 t3 t4 t Timing sequence(3)—Fault not confirmed Start confirmation Continue confirmation Start normal cycle again Data Collection Feature Extraction FuzzyDS WNN Fault Detected By FuzzyDS or WNN t0 No fault detected here t1 t2 t3 t4 t Software Design Developing platform: Microsoft Visual C++ , Visual Basic, SQL server 2000, Access 2000. Software running under Windows NT platform Component based open system architecture. All the system components are implemented as Microsoft COM objects Event-based distributed communication. Capable of transmitting events at different priorities. Provides database for storage of collected data,configuration information, diagnostic and prognostic results. Conceptual Model of PEDS — 3-Tier Client-Server Architecture User Services Prognostic Services Persistent Services Diagnostics Operator User Interface ICAS Database Interface Feature Extraction Administrator User Interface Tier 1 Prognostics Tier 2 Features Repository Interface Tier 3 Software Diagram Data sampling module Task Manager Time Out? Start Sampling Data Sampling Data Event N Enough Data? y Save to Database Diagnosis Start Feature Extractor N Wait for event Feature extractor Get Data From Database Events Feature Ready? y Event Prognosis Wait for event Wait for event Get Features From Database Get Features From Database Do Diagnosis Do Prognosis Save results to Database Save results to Database N Time Out? y Do Extraction Start Diagnosing Save Features to Database Start Prognosing Database relationships Mode Diagram Example Testbed: AC-Plant Mode Identification • Modes are characterized by the dynamics, set-points, and controller. • Modes switch due to events. Fuzzy Petri Net (Mode Changes Due to Events) Mode Decision Sensors Dynamics Classifier (Mode Due to Dynamics) Current Mode Fuzzy Petri Net other modes Normal Mode Overload Mode other modes Events Characterized by Membership Functions 53 44 If Chilled Water Inlet Temperature is above 53 degrees and Chilled Water Outlet Temperature is above 44 degrees then Switch to Overload Mode Fuzzy Petri Net Simulation Sensors Fuzzy Petri net marking mode marking SENSORS MODE CHANGES Pre-rotation vane position Chilled water inlet temperature Chilled water outlet temperature Normal Load Mode Full Load Mode Overload Mode Feature Selection and Extraction Sensors PreProcessing Data Feature Extraction Highpass Filtered horizontal and vertical scans - absolute value Windowed highpass filtered results - absolute value 10 10 8 8 6 6 4 4 2 2 50 100 150 200 250 300 2.5 0 50 100 150 200 250 300 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 Prediction of Fault Evolution Fault Classification 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180 Motivation: Data driven diagnostic/prognostic algorithms require for fault detection and classification a feature vector whose elements comprise the “best” fault signature indicators Intelligent distinguishability and identifiability metrics must be defined for selecting the best features Time and frequency domain analysis techniques must be employed to extract the selected features Overall Procedures for Feature Selection and Diagnostic Rule Generation PEDS Database Raw Database Featurebase Feature Vector Table Diagnostic Rulebase Feature Preparation Preprocessing Feature Extraction Rough Set* Feature Selection Rough Set Rule Generation * Rough set theory is a popular data mining methodology, which provides mathematical methods to remove redundancies and to discover hidden relationships in a database. Feature Preparation Heuristic Feature Pre-selection Fault Symptoms from ONR Report Featurebase Available Measurements from York Test Fault Mode1 Time Feature 1 Feature 2 York Test Database Raw Data Preprocessing* Feature Extraction Feature Candidates Decision t1 20.5 11.5 0 t2 23.5 9.5 1 * Preprocessing includes removing unreasonable objects from the database and assigning the operational status. Feature Extraction Architecture Maintainance Actions Rough Set Data MIner Feature Selector Feature Vectors Rule Generator Historical FeatureValues Featurebase Feature Vector Table Feature Table Diagnostic Rules Diagnostic Rule Table Classifier Designer Operator Sensor Suite Feature Vectors Raw data Data Calibration Preprocessing Classifier Parameters Diagnostic Rules Feature Extractor On-line Feature Values Diagnostic Results Alarms,/ Reports Diagnostor GUI Database Prognostor Historical Feature Value Diagnostic/Prognostic Module Prognostic Results Figure 1-2. Industrial Diagnostic/Prognostic Framework based on Data Mining Feature Selection. The Diagnostic Module 2.5 2 1.5 High-frequency failure modes (engine stall, etc.): The Wavelet Neural Net Approach 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 A Two-Prong Approach 0 100 200 300 400 500 13.8 13.6 13.4 Low-frequency events (Temperature, RPM sensor, etc.): The Fuzzy Logic Approach 13.2 13 12.8 12.6 12.4 12.2 0 50 100 150 200 250 Sensor Data Features raw data from sensor Denoised signal 13.8 3 13.6 13.4 normalized level 2 12.6 Preprocessing and Feature Extraction 0 -1 13 12.8 12.4 12.2 0 50 100 150 200 250 300 400 500 time 0.2 0.15 0.1 -2 slope normalized level 1 13.2 0.05 -3 0 0 100 200 300 400 500 time -0.05 0 100 200 time Failure Templates Fuzzify Features Fuzzy Rule Base (1) If symptom A is high & symptom B is low then failure mode is F1 (2) ... Inference Engine (Defuzzify) Failure Mode Fuzzy Logic Diagnostic Architecture Features Fuzzy Logic Classification Dempster-Shafer Theory of Evidence Construct Mass Functions from Possibilities Combine Evidence Calculate Degree of Certainty Degree of Certainty Fuzzification Rulebase Fuzzy Inference Engine Defuzzification Threshold for Fault Declaration Fault Declaration Prognostics y y Two Prognostic Scenarios: Unsupervised Two Prognostic Supervised t Value-At-Any Time t t y y Outcomes: Time-To-Failure t Prognostics Supervised F(n) Fault Dimensions Failure Rates Trending Information Temperature Predictor DWNN Fault Growth (DWNN) F(n+h) U(n) Component ID Unsupervised ~ F (n h ) ( F (n ), U ( n )) Delay h F(n) F (n h ) DWNN ( F (n ), F (n ) F (n T1 ) F (n 1) F ( n T2 ) , ,U (n )) T1 T2 DWNN U(n) F(n+h) The Prognostic Module QUESTION: Once an impending failure is detected and identified, how can we predict the time window during which maintenance must be performed? Sensor Data Prognostic Module Diagnostic Module T t CBM (T=?) APPROACH: • Employ a recurrent neuro-fuzzy model to predict time window T • Update prediction continuously as more information becomes available from the diagnostic module Wavelet Neural Network (WNN) Hidden Layer 1 x1 y1 2 x2 cjk 1 M-1 xN Input Layer M yK Output Layer clin1 clin K Direct Link y [ A1 ,b1 ( x) A2 ,b2 ( x) AM ,bM ( x)]C [ x1 ]Clin Dynamic Wavelet Neural Network (DWNN) Y(t) Y(t-1) Y(t-M) U(t) z-1 z-2 z-M WNN Y(t+1) U(t-1) U(t-N) Y(t+1) = WNN(Y(t), …,Y(t-M), U(t), …, U(t-n)) On Virtual Sensors Many failure modes are difficult or impossible to monitor Question: How do we build a “fault meter”? Answer: Virtual Sensor The Notion: Use available sensor data to map known measurements to a “fault measurement” Potential Problem Areas: How do we train the neural net? Laboratory or controlled experiments required Dynamic Virtual Sensor Vibration Signals Power Spectrum Data Acoustic Data Dynamic Virtual Sensor Temperature Component ID (DWNN) Fault Growth (fault dimensions location, etc) Implementation of The Prognostic System Hardware (on-line) O b j e c t Raw Signal tri-axial vibrometer Signal Acquisition Hardware CPU or DSP Processor Software (on-line and off-line) Signal Preprocessing Feature Extraction WNN DWNN Prognosticator R e s u l t Application Examples A defective bearing with a crack causes the machine to vibrate abnormally Vibrations can be caught with accelerometers which translate mechanical movement into electrical signals Bearing crack faults may be prognosed by examining and predicting their vibration signals An Experimental Setup - Cracks on the races or balls - Particles in the lubricant - Gaps in between moving parts, etc A Sample Database An exemplary database for training the prognostic predictor Signals Features Fault Dimensions Time Stamps Stext1.dat 10.25 102.05 12:23:48 Stext2.dat 15.63 115.21 12:26:23 Stext3.dat 24.84 152.34 12:29:55 Stext4.dat 35.76 190.45 12:32:61 Bearing Fault Diagnosis Time Domain - defective bearing 5 0.3 4 0.2 3 0.1 2 Amplitude Amplitude Time Domain - good bearing 0.4 0 -0.1 0 -1 -0.3 -2 -0.4 -3 0 200 400 600 800 -4 1000 Time features = [0.3960 0.1348] 1 -0.2 -0.5 For the good bearing, For the defective bearing, 0 200 400 600 800 1000 Time Spectrum Domain - good bearing Spectrum Domain - defective bearing 0.14 10 [0 1] = WNN([0.3960 0.1348]) ===> The bearing is good! 9 0.12 8 7 0.08 Amplitude Amplitude 0.1 0.06 0.04 6 5 [1 0] = WNN([4.9120 9.2182]) ===> The bearing is defective! 4 3 2 0.02 1 0 features = [4.9120 9.2182] 0 20 40 60 80 Spectrum 100 120 140 0 0 20 40 60 80 Spectrum 100 120 140 Bearing Fault Prognosis 12 Power Spectrum Area 10 Failure Condition 8 6 4 TTF 2 0 0 20 40 60 Time Window 80 6 Power Spectrum Area 5 Original 4 TTF = 19 time units DWNN Output 3 2 1 0 0 20 40 60 Time Window 80 100 100 120 Current time 6 WNN Output 3 2 1 Prediction Time 0 20 40 60 Time Window 80 Prediction up to 98 time windows using the trained WNN 100 Predicted time to failure Failure Condition 10 Real Data 4 0 12 Power Spectrum Area Power Spectrum Area 5 Current time Finish time 8 6 Time-to-failure 4 2 0 0 20 40 60 Time Window 80 100 Prediction of time-to-failure using the trained WNN time-to-failure = 38 time windows 120 Prognosis in the frequency domain: Prognosis using the trained DWNN Training of the DWNN with the PSD growth data 450 350 400 300 Original 350 DWNN Output 300 200 Power Spectra Power Spectra 250 150 100 250 200 150 100 50 50 0 0 -50 0 100 200 300 400 500 Frequency Window 600 700 800 -50 0 200 400 600 800 Frequency Window 1000 1200 Vibration Data with Growth and Its max PSDs Training of DWNN Using GA Gene tic Optimiza tio n 0.0 3 8 0.0 3 6 C o st functio na l 0.0 3 4 0.0 3 2 0 .0 3 0.0 2 8 0.0 2 6 0.0 2 4 0 50 1 00 Gene ratio n 1 50 20 0 DWNN-Produced MaxPsd profiles Original MaxPSD profiles for the three axes 500 500 450 400 400 350 300 300 250 200 200 100 150 100 0 50 0 0 20 40 60 80 100 -100 0 20 40 60 80 100 DWNN-Predicted defect sizes 500 2000 400 1500 Width Width Original and DWNN-Produced defect sizes 300 200 500 100 0 0 20 40 60 80 100 300 1500 200 1000 Depth Depth 0 1000 100 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 500 0 0 -100 0 0 20 40 60 80 100 -500 Mounting Bolt Faults Original Vibration Signal 0.4 0.2 Voltage (mv) 0 -0.2 -0.4 -0.6 -0.8 -1 0 200 400 600 800 1000 Time (ms x 10) 1200 1400 1600 PSD Peaks of Windowed Vibration Signal 0.35 0.3 PSD Peak Values 0.25 0.2 0.15 0.1 0.05 0 0 20 40 60 Window Tags 80 100 Problem Definition Challenges in “Technology Prediction” – Long term prediction (i.e. 30 years) is facing growing uncertainty – Current trends can be misleading – External causal factors are numerous and difficult to quantify – Data is sparse, difficult to extract a trend Confidence Prediction Neural Network Time Series Prediction Architecture multi-step prediction using a novel neural network with a probability distribution function estimator technique time-series data Data Pre-Processing filtering, buffering, etc. (optional) iterated forecasting 1 Forecasting with prediction uncertainty representation 3 Causal Adjustment impacts of •-budgets •-economic factors •-political factors •-etc 2 Uncertainty management via learning reduce uncertainty in the prediction by eliminating unlikely outputs 4 simulated major events that might occur along prediction horizon Scenario Construction The Confidence Prediction Neural Network (CPNN) For CPNN, each node assigns a weight (degree of confidence) for an input X and a candidate output Yi. Final output is the weighted sum of all candidate outputs. In addition to the final output, the confidence distribution of that output Numerator Denominator Confidence distribution approximator Summation layer Pattern layer output can be computed as Input layer CD( X , Y ) (Y Yi ) 1 1 C ( X , Yi ) exp[ ] 2 (2 ) CD l i 1 2 CD l 2 CPNN Prognostic Results Without reinforcement learning 6 5 4 historical data prediction 3 real failure time 2 0.9 0.8 0.7 0.6 0.5 0.4 1 0.3 0.2 0.1 0 95 96 97 98 99 100 101 dist of prognostic failure time 0 0 20 40 60 80 100 120 Prognostic Results With reinforcement learning 6 5 4 3 2 0.9 0.8 0.7 0.6 0.5 1 0.4 0.3 0.2 0.1 0 0 0 20 40 60 80 96 100 97 98 99 100 120 101 New Challenges for CBM/PHM Systems Diagnostic and prognostic systems have so far been – designed in an ad hoc manner – static - in terms of performance – passive - in that they only respond to events and never initiate actions – centralized - so that either the knowledge-base, control, or model is centralized even for distributed frameworks – tightly coupled - resulting in poor error-tolerant frameworks – non-scalable – non-portable - since they are very system specific, and – require expert personnel to upgrade/update Centralized Control & KB Architectures Sensors UUT Events Preprocessing Knowledge Base Diagnostic Algorithm Sensors UUT Data-mining Events Sensors UUT Events Feature Extraction Control Prognostic Algorithm A Generic Central Control and Knowledge Base Framework Diagnosis Prognosis Distributed Control & KB Architectures Sensors Knowledge Fusion UUT Events Diagnostic Algorithms Central Control Diagnosis PrognosticAlgorithms Prognosis Central Knowledge Base Distributed Control and Knowledge Base Framework Hardwa Hardware r e Model-Based Software Architecture System (chiller, gas turbine, etc.) DAQ/CPU ICAS Shipboard Systems / Components Preliminary Diagnostics Intelligent Selection Layer Diagnostic Algorithms Decision Support Layer Prognostic Algorithms Online Static and Dynamic Case Library Offline Interface Layer Multiagent System Intelligent Agent Object Oriented Hybrid System Models Software Repository Sensors Designer Statistics Optimization Performance Assessment Module Offline Prescription Maintenance Plan Online Online Maintainer Community of Agents – Multiagent System The System Modeling Approach I. Concept of Hybrid System The System Modeling Approach (continued…) II. The Object-Oriented Hybrid Model Architecture Object-Oriented Modeling - PhysicsBased or Physical Modeling System Concept - An Example JSF Propulsion System Components • Components • Interconnections/ Couplings Clutch Lock Actuator #6 GB Brg & Spanner Nut Geardrive Shaft Clutch Pack Actuator 1 2 #3 #4 #6 Semi-empirical models Deterministic-stochastic models Finite-element models Carbon Plates Lock Clutch Spline Output Shaft Clutch Input Shaft Drive shaft Interface Physical Modeling in Prognosis The concept of “Virtual Sensor”: Measurable Quantities Problem: Solution: Physical Model Fault Dimensions Difficult or impossible to monitor in an operational state fault dimensions. Train a physical model (semi-empirical, physics-based) to map measurable quantities (vibration, temperature, etc.) into fault dimensions (crack length, wear etc. Control Architecture for Intelligent Agent Paradigm level 2 Interface Control level 1 Decision Support Control level 0 Intelligent Selection Control Figure 2: Subsumption architecture for Intelligent Agent. Important Features of Agenthood Cooperation Autonomy Adaptation • Adaptation: • Learning, Knowledge-Discovery • Communication • Self-Organization • Cooperation: • Agent Communication Language • Cooperate with other Software Agents/User Agents • Form Agent Communities - MAS • Autonomy: • Proactive and Reactive • Goal-directed • Multi-threaded The IA Paradigm: Agent-Oriented Programming (AOP), Agent-Oriented Design (AOD), Agent-Oriented Software Engineering (AOSE) Intelligence of Intelligent Agents A simple reflex agent A reflex agent with internal states Sensors What actions I should do now Effectors State An agent with explicit goals Sensors How the world evolves What the world is like What my actions do What actions I should do now State How the world evolves What the world is like What my actions do What the world will be after I do action A Effectors What actions I should do now An agent with utility State How the world evolves What my actions do Varying degrees of intelligence Recognize goals and intentions React to unexpected situations in a robust manner Focus of Research: Concepts of Learning, SelfOrganization, and Active Diagnosis Effectors Utility Sensors What the world is like What the world will be after I do action A How happy I will be in this state What actions I should do now Effectors ENVIRONMENT Goals Sensors ENVIRONMENT Conditionaction rules ENVIRONMENT Conditionaction rules ENVIRONMENT What the world is like Learning Issues & Approaches Issues in Learning: Incremental Learning Ability Order Independence Unknown Attribute Handling Noisy Data Tolerance Source Combination Open Questions: – What should be learned? – When to learn? – How to learn? Sensors Critic feedback changes Learning Element Performance Element knowledge Learning goals Problem Generator Effectors Figure 5.1 A general model of learning agent [50]. Possible Solutions: – Learn diagnostic rules via Episodic Information (Experiences) – Learn when Episodic Information is not enough to handle current scenario – Aggregate learning episodes using some concept learning technique ENVIRONMENT – – – – – Performance Standard Case-Based Reasoning & Learning CBR - an episodic memory of past experiences CBR - initial cases by examples CBR Methodology: Indexing (generate indices for classification and categorization) Retrieval (retrieve the best past cases from the memory) Adaptation (modify old solution to conform to new situation) Testing (did the proposed solution work) Learning (explain failed & store successful solutions) Case Library Failure Mode i Case # … 1 2 3 S1 Symptoms S2 … Sm Tests Prescription The Dynamic Case-Based Reasoning Architecture Sensory data Feature interpretation (static, dynamic, composite) Case indexing AS path Analytical Models and algorithms Indexing path selection Indexing rules PD path Phase matching evaluator Case retrieval Case similarity calculation Case memory inactive active Propagation evaluator Case adaptation Model-based reasoner New case constructor Test/evaluation Remembrance calculation Failure driven learning Model base Structure of Static and Dynamic Case Library tfailure Tests Case #j Time to Failure TF1 TF2 TF3 … … Dynamic Case Library Failure Mode i Conditions time y1 y 2 … y m t1 t2 t3 0 Case Library Prescription Tests … Static Case Library Failure Mode i Symptoms Case # S1 S2 … Sm 1 2 3 Concept Learning Example 1 Has_A, 1/3 Has_C, 1/3 Has_B, 1/3 attrB: m attrA: a Has_A1, 1/2 Object A attrC: 16 Has_A2, 1/2 attrA2: 2 attrA1: a Example 2 Object A Memory Aggregate (MA) approach: – systematic learning from examples and counter-examples – incremental and order independent – combines sources – counter examples handled similarly by building an MA called C_MA MA based on Examples 1&2 Object A Has_A, 1/2 Has_B, 2/5 Has_B, 1/2 attrA: a Has_A1, 1/2 attrA1:b Has_C, 1/5 Has_A, 2/5 attrB: m Has_A2, 1/2 attrA2: 2 Has_B1, 1/1 attrB1: y attrA: a,2/2 attrB: m2/2 attrC: 16,1/1 Has_A1, 2/4 attrA1:b,1/2 a, 1/2 Has_A2, 1/2 Has_B1, 1/1 attrA2: 2/2 attrB1: y,1/1 Concept CBR Learning ‘Concepts’ as Case Library – Storing Memory Aggregates that are sufficiently different from each other as unique cases Memory Aggregates are also used for Case Retrieval, Adaptation, Testing, and Learning For Diagnostic Framework, Concept CBR can learn: – concept of “Normal Mode(s)” of operation given the sensor and event readings as attributes – concept of unique “Fault Modes” given that the modes differ sufficiently in their signatures Multiagent Systems (MAS) An MAS is a loosely coupled network of problem solvers that interact to solve problems that are beyond the individual capabilities or knowledge. Characteristics of MAS: – each agent has incomplete information; – there is no system global control; – data is decentralized; and – computation is asynchronous. Multiagent Software Engineering (MaSE) A Multiagent Diagnostic & Prognostic Framework Global Perspective: • Analysis and knowledge-distribution at multiple levels of abstraction. • The MAS framework becomes part of a Decision Support System. Levels Of Abstraction Decision Support Framework Inventory Control Framework Multiagent Diagnostic and Prognostic System Decision Support Inventory Control Sub-Domain Level: Multiagent Diagnostic and Prognostic System Decision Support Inventory Control System Level: Multiagent Diagnostic and Prognostic System Sub-system Level: Multiagent Diagnostic and Prognostic System Component Level: Multiagent Diagnostic and Prognostic System Domain Level: Abstraction Multiagent Diagnostic & Prognostic Framework Decision Support Decision Support Inventory Control Engineering a Multiagent System - I 1.1 Generate the best and safe Health Estimate of the Domain Layer Goal Hierarchy Diagram – to specify goals and sub-goals 2.1 Get failures from local knowledge Sequence Diagram – to establish possible roles and how communicating agents will achieve them 3.1 Perform local diagnostic and prognostic tests 2.2 Explain failures of other agents that are related 3.2 Ask others to help explain local failures Induced Agent Classes – mapping from roles to agent definitions – aggregation of roles/agent classes according to available resources and similarity of roles 2.3 Ascertain risk and reliability associated w current diagnosis 4.1 Organize postings for failures ExplanationSeeker(k) MessageOrganizer(i) 5.1 Send messagesPostMsg(src) 5.2 Receive and to agents organize messages [Explain(src,,failure,time, del_t)] 5.3 Remove messages ExplanationProvider(m) 5.4 Update messages 5.5 Priori according PostMsgConfirmation [Explain(src,tag,failure,time,del_t)] FindAndCopyMsgs(src) 6.1 Tag messages to make them unique [Explain(*,*,failures…, ,)] 6.2 Search messages PostMsg(src) … [Explain(src,tag,failure,time,del_t)] [SelfCheck(time, del_t) Figure 3.5 Goal Hierarchy Diagram for a Diagnostic and Prognostic RemoveMsg [Explain(,tag, , ,)] UpdateMsg [Explain(,tag,failure…,,)] FindAndCopyM [Explain(,,*,,)] PostMsg(src)… Engineering Multiagent Systems - II Mapping Roles to Agent Classes – Diagnostic Agents – Sensor Agents – Black Board Agents – Risk & Reliability Assessment Agents Active Diagnosis Extends the offline ideas of “Probing” or “Testing” It is biased to monitor normal conditions Active Diagnosis Monitors consistency among data Active Diagnosis of DES - A Design Time Approach – the system itself is not diagnosable – design a controller called “Diagnostic Controller” that will make the system diagnosable Active Diagnosis Possibilities: – Inline with Intelligent Agent paradigm – Collaboration in Multiagent Systems can be directed to achieve Active Diagnosis Active vs. Passive Diagnosis Passive Diagnosis: Diagnoser FSM that monitors events and sensors to generate diagnosis. A Diagnosable Plant generates a language from which unobservable failure conditions can be uniquely inferred by the Diagnoser FSM. Unobservable Failures Plant (Chiller/Pump & Valve) Controller Sensors Diagnoser FSM Observable Events Design-Time Active Diagnosis: Observable State Diagnosis Unobservable Failures Design a controller that will make an otherwise “non-diagnosable” plant generate a language that is diagnosable. Plant (Chiller/Pump & Valve) Controller Diagnose r Diagnoser FSM Observable Events Diagnosis Sensors Observable State Active Diagnosis - Agent Perspective Given an anomalous situation, Diagnostic Agent Plans, Learns, and Coordinates. – Learning takes place between distributed agents that share their Observable Events experiences Network of – Coordination helps search, retrieval, Diagnostic Agents adapting activities – Planning is required to determine if learning and coordination is possible in the given expected time-to-failure condition “Run-time” Active Diagnosis – non-intrusive – autonomous and rational Unobservable Failures Plant (Chiller/Pump & Valve) Controller Diagnostic Agent Diagnosis Alarms/DB Diagnostic Agent Planning Coordination Learning Sensor Sensor Agents Agents Observable State Performance Measures (How to Compare and ) Measures Precision for Prognosis a measure of the narrowness of an interval in which the remaining life falls Reliability how the system responds to individual component failures Extensibility or Scalability how the system can be extended if new components are added Robustness how the system tolerates uncertainty Reuse or Portability how easy or hard it is to use this system in another problem domain Accuracy how an agent improves true positives and true negatives as a result of learning, self-organization, and active diagnosis Entropy a measure of how the system learns and organizes over time. Decreasing entropy signifies increasing order in a multi-agent system, resulting in more accurate and timely diagnoses Network Activity how much network related activity results if the framework is implemented for distributed systems Innovative Thrusts: Intelligent Agent-based Distributed PHM Architecture Error-tolerant, flexible, and scalable diagnostic/prognostic framework Automated Prescription: – What is wrong and how to I fix it? – When do I fix it? – How much confidence do I have that the system will not fail during the execution of a mission? Object-oriented modeling framework/physics-based models Case-based reasoning paradigm - archive case studies - reason about new situations Open Systems Architecture - OSA CBM and ICAS compatible Active Diagnosis / Prognosis notions Performance Metrics / Performance based PHM modules The Candidate Application Domains Shipboard Processes - gas turbines, AC plants, elevators, main fire systems, etc. Propulsion / Drivetrain Components - clutch, Seals, pumps, bearings, blades, etc. Power Systems - generation, shipboard distribution Radar tracing and other communicationrelated systems Other Contributions Development of a new methodology for the design of a diagnostic and prognostic framework for large-scale distributed systems Development of Run-time Active Diagnosis approach / extension to Prognosis Development of Learning Strategies for diagnostic / prognostic problem domain Development of performance measures for the comparison of centralized vs. distributed diagnostic and prognostic frameworks Implementation Issues Embedded Distributed Diagnostic Platform (EDDP) Hardware: – Modular I/O (e.g. NI’s FieldPoint System, or MAX-IO). – Embedded PC (e.g. MPC - Matchbox PC of TIQIT or MAX-PC of Strategic-Test). – Network (e.g. Ethernet, PROFIBUS, CAN). Software: – Windows CE, Linux, QNX, VxWorks, or OsX operating systems. – Embedded databases (like Polyhedra). A Possible Agent Node An Operator Interface (LCD Display) A Small PC (MPC, MAX-PC) Network (Ethernet, CAN, Profibus) Distributed I/O System (FieldPoint) Sensors Sensors Sensors CBM Performance Assessment Objective: – To assess the technical and economic feasibility of various prognostic algorithms Technical Measures: – Accuracy, Speed, Complexity, Scalability Overall Performance Measure: – w1Accuracy + w2Complexity + w3Speed + … (wi - weighting factors) PM1 PM2 PM3 Algorithm #1 * * * Algorithm #2 * * * Algorithm #3 * * * Performance Assessment Matrix: CBM Performance Assessment Target Measure: PM yr (n f ) y p (n f ) yr (ns ) y p (ns ) Output y(n) Real yr(n) Behavior Measure: Predicted yp(n) nf PM w(i) yr (i) y p (i) i ns tpf Discrete time n Mean E{t pf and Variance Measures: 1 1 V {t } [t } t (i ) N N N N i 1 pf pf i 1 2 ( i ) E { t }] pf pf Complexity/Cost-benefit Analysis Complexity Measure t p td computation time complexity E E t time to failure pf Cost/Benefit Analysis – – – – frequency of maintenance downtime for maintenance dollar cost etc. Overall Performance Overall Performance = w1accuracy + w2complexity + w3cost + …. Cost/Benefit Analysis Establish Baseline Condition - estimate cost of breakdown or time-based preventive maintenance from maintenance logs A good percentage of Breakdown Maintenance costs may be counted as CBM benefits If preventive maintenance is practiced, estimate how many of these maintenance events may be avoided. The cost of such avoided maintenance events is counted as benefit to CBM. Cost/Benefit Analysis (cont’d) Intangible benefits - Assign severity index to impact of BM on system operations Estimate the projected cost of CBM, i.e. $ cost of instrumentation, computing, etc. Aggregate life-cycle costs and benefits from the information detailed above CINCLANTFLT Study Question: “What is the value of prognostics?” Summary of findings: (1) Notional Development and Implementation for Predictive CBM Based on CINFCLANTFLT I&D Maintenance Cost Savings (2) Assumptions – – – – – CINCLANTFLT Annual $2.6B [FY96$] I&D Maintenance Cost Fully Integrated CBM yields 30% reduction Full Realization Occurs in 2017, S&T sunk cost included Full Implementation Costs 1% of Asset Acquisition Cost IT 21 or Equivalent in place Prior to CBM Technology (3) Financial Factors – Inflation rate: – Investment Hurdle Rate: – Technology Maintenance Cost: (4) Financial Metrics: 4% 10% 10% Installed Cost 15 year 20 year Concluding Remarks CBM/PHM are relatively new technologies sufficient historical data are not available CBM benefits currently based on avoided costs Cost of on-board embedded diagnostics primarily associated with computing requirements Advances in prognostic technologies (embedded diagnostics, distributed architectures, etc.) and lower hardware costs (sensors, computing, interfacing, etc.) promise