Anomalous Events in Non-Destructive Inspection Data 18 Dec 2012 Jeremy S. Knopp AFRL/RXCA Air Force Research Laboratory Integrity Service Excellence 1 Disclaimer • The views expressed in this presentation are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the United States Government 2 Outline • Historical Perspective of Aircraft Structural Integrity Program (ASIP) • Probability of Detection (POD) • Nondestructive Evaluation System Reliability Assessment Handbook (MIL-HDBK-1823A) Revision • Research Objectives to Improve State-of-the-Art POD Evaluation 3 Aircraft Management Strategies Safe Life – No Periodic Inspection Required. – – – Fly a certain number of hours and retire. Considers the effects of cyclic loading on the airframe with full-scale fatigue test. For example, testing to 40,000 hours ensures safe life of 10,000 hours. • Used by US Navy. Damage Tolerance Assessment (DTA) – Periodic Inspection to Detect Damage – Fly and inspect, reassess time to next inspection based on fatigue crack growth analysis, usage, and results of inspection. • Assumes imperfections are present in the early stages of aircraft service. • REQUIRES RELIABLE AND VALIDATED NDI • Used by US Air Force. Condition-based Maintenance (CBM) – Periodic Inspection and/ or onboard monitoring to Characterize Damage. – Perform repairs only when needed. • Will minimize maintenance costs. • Requires damage characterization, not just detection. • Desired by US Air Force to maximize availability of assets while minimizing sustainment costs. Condition-based Maintenance (CBM+) – Periodic Inspection to Characterize Damage – CBM plus prognosis to estimate capability and remaining life for optimal maintenance scheduling. 4 The USAF Aircraft Structural Integrity Program (ASIP) • Provides the engineering discipline and management framework … – associated with establishing and maintaining structural safety … – in the most cost-effective manner … – through a set of defined inspections, repairs, modifications and retirement actions • Based on a preventative maintenance strategy that starts in acquisition and continues until retirement 5 “Wright” approach to Structural Integrity • Approach used by Wright brothers began in 1903. • Essentially the same approach used by USAF for over 50 years. • They performed stress analysis and conducted static tests far in excess of the loads expected in flight. • Safety factor applied to forces that maintained static equilibrium with weight. 6 B-47 Experience, 1958 • Air Force Strategic Air Command lost two B-47 Bombers on the same day! • Metal fatigue caused the wings on two aircraft to fail catastrophically in flight. • Standard static test and abbreviated flight load survey proved structure would support at least 150% of its design limit load. • No assurance that structure would survive smaller cyclic loads in actual flight. 7 ASIP Initiated • Aircraft Structural Integrity Program (ASIP) initiated on 12 Jun 1958 with 3 primary objectives: – Control structural fatigue in aircraft fleet. – Develop methods to accurately predict service life. – Establish design and testing methods to avoid structural problems in future aircraft systems. • Led to the “safe-life” approach. – Probabilistic approach to establishing the aircraft service life capability. – Safe-life established by conducting a full-scale airframe fatigue test and dividing the number of successfully test simulated flight hours by a scatter factor (usually 4). 8 F-111 Experience, 1969 • Wing separation at ~100 hours (safe-life qualified 4000 hours). Crack initiated from a manufacturing defect. • Two-phase program initiated. • Phase 1 (allow operations at 80% of designed capability) – Material crack growth data collected to develop flaw growth model. – Cold proof test to demonstrate that critical size flaws not present in critical forgings – Improved NDI for use in reinspection • Phase 2 (allow operations at 100% of designed capability) – Incorporated NDI during production. – Used fracture mechanics to determine inspection intervals. 9 Damage Tolerance Update, 1974 • In response to F-111 mishap, ASIP incorporated Damage Tolerance requirements. – Objective was to prevent airframe failures resulting from the safe life approach . • ASIP provides 3 options to satisfy damage tolerance requirement – Slow crack growth (most common option) – Fail-safe multiple load path – Fail-safe crack-arrest • Primary basis for aircraft structure maintenance program for last 30+ years. – Inspection requirements based on initial flaw assumptions (slow crack growth) and NDI capability. • Today - Inspection burden is increasing due to age of fleet! – NDE Research needed to reduce the future maintenance burden. 10 Evolution of Structural Integrity Approaches Timeframe Associated with ASIP Approach ASIP Approach Prevent Structural Failures Cost-Effectively 1950 1960 1970 1980 1990 2000 2010 2020 Prevent Static Load Failures Prevent Fatigue Failures Protect for Potential Damage Risk Assessment/Management MIL-STD-1530C Each change was made to enhance our ability to protect structural integrity (prevent structural failures) Today, preventing structural failures requires anticipating events that ensure continuing airworthiness, reliability, availability, and costeffectiveness 11 USAF Structural Reliability • USAF aircraft losses since 1971: – 18 due to a structural failure – 19 due to a structural failure that was caused by maintenance, pilot error, flight control failures, etc. • Next chart plots overall USAF aircraft loss rate from 1947 – 2002 and structures contribution since 1971 – Overall loss rate calculated for each year (total losses per year / total fleet flight hours per year) – Loss rate due to structures is cumulative since many years without losses due to structural failure 12 USAF Structural Reliability USAF Aircraft Loss Rate (Destroyed Aircraft) Number of Aircraft Losses / Flight Hours 1.E-03 All Causes Structures = 37 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 1940 1 C. Structures = 18 1950 1960 1970 1980 1990 2000 2010 Babish, “USAF ASIP: Protecting Safety for 50 Years”, Aircraft Structural Integrity Program Conference (2008) 13 Rare Events • Nov 2, 2007 – Loss of F-15C airplane, 0 casualties • Aircraft operated within limits • Mishap occurred due to a fatigue failure in a forward fuselage single-load-path. • Hot spot missed during design and testing and aggravated by rogue flaw. • NDI can be used to prevent fracture at this hot spot. 14 Reliability of NDT • Probability of Detection1 • Given a population of cracks of size ‘a’ – geometry, material, orientation, location, … • Given a defined inspection system • POD(a) = Probability that selected cracks of size ‘a’ from the population will be detected – POD(a) = Proportion of all size ‘a’ cracks from the population that would be detected 1 A. P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control, pp. 689-701. ASM International, 1989. 15 Reliability of NDT • POD curve 1 0.9 • Two parameters 0.8 – (μ and σ) • σ describes slope of the curve. Steep curve is ideal. a50 0.6 POD • μ is a50 a90 0.7 a90/95 or ande 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 flaw size (mm) 3.5 4 4.5 5 16 Inspection Intervals ASIP Damage Tolerance Inspection Intervals Tf T3 aCR Crack size - a Inspections occur at 1/2 the time associated with the time it takes for a crack to grow from initial size to failure, e.g., T2 = 0.5*(T3 - T1) acr-miss aNDE a0 T1 T2 T3 Equivalent (standard spectrum) or Flight hours 17 Reliability in NDT • What is ande? 1 • aNDE is the “reliably” detected crack size for the applied inspection system. • Variations of this can be investigated. 0.8 a90 0.7 a50 0.6 POD • Traditionally, reliably detected size has been considered to be the a90 or a90/95 crack size from the estimate of the NDE system POD(a). 0.9 a90/95 or ande 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 flaw size (mm) 3.5 4 4.5 5 18 Reliability of NDE • Development of POD was a very important contribution to quantifying performance of NDE • Necessary for effective ASIP program. Damage Tolerance approach requires validated NDE capability. • Quantifying largest flaw that can be missed is important. • Capability of detecting small flaws less important. • First serious investigation – Packman et al 19671 – Four NDI methods (X-ray, dye penetrant, magnetic particle, and ultrasonics) Packman et al. The applicability of a fracture mechanics – nondestructive testing design criterion. Technical Report AFML-TR-68-32, Air Force Materials Laboratory, USA, May 1968. 1 P.F. 19 Reliability of NDT • Rummel et al 19741 – NASA Space Shuttle Program – Five NDI methods (X-ray, fluorescent penetrant, eddy current, acoustic emission, and ultrasonics) • Lewis et al 19782 (a.k.a – “Have Cracks Will Travel”) – Major US Air Force program to determine reliability. – Perhaps the largest program of this kind in history. – Disappointing results concerning NDI capabiliity. • Both studies inspired more advanced statistical analysis 1 W.D. Rummel et al, The detection of fatigue cracks by nondestructive testing methods. Technical Report NASA CR 2369, NASA Martin Marietta Aerospace, USA, Feb 1974. 2 W.H. Lewis et al, Reliability of nondestructive inspection – final report. Technical Report SA-ALC/MME 76-6-38-1, San Antonio Air Logistics Center, USA, Dec 1978. 20 Statistical Analysis – POD • Two types of data collected – “Hit/Miss” – binary data in terms of whether or not a flaw is found – “â vs a” – continuous response data has more information (â = signal magnitude, a = size) • Statistical rigor introduced in USAF study conducted by Berens and Hovey in 19811. – Previous analysis methods grouped “hit/miss” data into bins and used binomial statistics to evaluate POD. – Berens and Hovey introduced mathematical model based on loglogistic cumulative distribution function to evaluate POD. This is still standard practice. 1 A.P. Berens and P.W. Hovey, “Evaluation of NDE Reliability Characterization,” AFWAL-TR-81-4160, Vol 1, Air Force WrightAeronautical Laboratories, Wright-Patterson Air Force Base, Dec 1981. 21 Statistical Analysis – POD • Hit/Miss analysis – Sometimes only detection information available (i.e. penetrant testing). Can also be used if constant variance assumption is violated. – Model assumes POD is a function of flaw size. log(a) POD(a) 0 1 log(a) – For logit model (logistic) ( z) exp(z ) 1 exp(z ) – For probit model (lognormal) (z) is the standard normal cumulative distribution function. – Maximum likelihood estimates 0 and 1 1 1 A. P. Berens, NDE Reliability Data Analysis. In American Society for Metals Handbook Vol 17 Nondestructive Evaluation and Quality Control, pp. 689-701. ASM International, 1989. 22 Statistical Analysis – POD • Hit/Miss analysis – Unchanged since Berens and Hovey except for confidence bound calculations. – Confidence bound calculations are not available in any commercial software package. – Traditional Wald method for confidence bound calculation is anti-conservative with hit/miss data. – Likelihood ratio method for confidence bound calculation is used in the revised MIL-HNBK-1823A. This is a very complicated calculation. See Annis and Knopp for details1. 1 C. Annis and J.S. Knopp, “Comparing the Effectiveness of a90/95 calculations”, Rev. Prog. Quant. Nondestruct. Eval. Vol 26B pp. 1767–1774, 2007 23 Statistical Analysis – POD • Hit/Miss analysis – example a90 a90 95 a50 0.040 1.0 0.035 0.8 0.7 a50 0.1156 a90 0.1974 95 0.5 a POD a 1 link function = logit ^ 0.1156 ^ 0.025147 0.2 n hits 0.1 0.020 92 n total 134 0.0 0.015 EXAMPLE 3 hm.xls 0.0 0.1 0.2 Size, a (inches) 1 MIL-HDBK-1823A, 0.3 + 0.025 0.4 0.3 0.030 1 a90 0.1709 0.6 1 Probability of Detection, POD | a 0.9 loglikelihood ratio Cheng & Iles approx EXAMPLE 3 hm.xls 0.4 mh1823 Non-Destructive Evaluation System Reliability Assessment (2009). 0.10 0.11 0.12 0 1 0.13 mh1823 24 Statistical Analysis – POD • “â vs a” analysis (â = signal strength, a = flaw size) – Magnitude of signal contains information. – More information results in more statistical confidence, which ultimately reduces sample size requirements. – Again, regression model assumes POD is function of flaw size. – Censored regression almost always involved, so commercial package such as SAS or S-Plus necessary. where, log(a) POD(a) 1 MIL-HDBK-1823A, (log(athreshold ) 0 ) 1 1 Note : 2 Non-Destructive Evaluation System Reliability Assessment (2009). Regression variance 25 Statistical Analysis – POD • â vs a analysis 1400 • Basically a linear model. POD(a) 1200 • Delta method used to generate confidence intervals on POD curve. 0.6 + 0.4 0.2 1000 response, â • Wald confidence intervals sufficient. + 0.8 + 800 600 a50 8.8 a90 12.69 a90 95 13.68 400 200 P false call ----- 0.11 - EXAMPLE 1 â vs a.xls 0 2 10 0 3 4 5 6 7 9 10 2 1 Size, a (mils) 3 4 5 6 7 9 10 2 mh1823 26 MIL-HDBK-1823A 27 MIL-HDBK-1823A Summary • Completed in 2007; released in 2009 • 132 pages • All new figures (65) • Approximately 70% new text • Based on best-practices for NDE and statistical analysis • 100% new software available – â vs. a – hit/miss 28 MIL-HDBK-1823A Support Website • Download the Handbook • Request the mh1823 POD software http://mh1823.com/mh1823 29 Addressing Deficiencies (1) • Concern exists on performing a POD calculation on poor data sets – Poor data sets can be defined as: • Limited in sample size • Data does not follow typical POD model fits – Problem when wrong model used for statistical inference –[loop:Worst case scenario: database –a fictitious database – a90/95 may be obtained. j=1:P] feature vector (i,j, kl') damage state measures(i,j,m) ln a ln POD(a) 1 exp 3 • Onedistributed possible signal remedy issignal a ‘4 parameter model’: damage maintenance sensor data (i,j,k,l) processing / feature extraction classification decision criteria action (i,m) – Proposed by Moore and Spencer in 1999, database – damage state measures(i,j,m) damage decision criteria e state – â(i,j,m) raw PODdata (a)(i,j,k,l) vector feature exp 1(i,j,kl') state damage ln a ln measures – â(i,j,m) 3 1 call (i,m) 1 α : false call rate β : 1 - random missed flaw rate σ : curve steepness μ : flaw size median (50% POD) maintenance parameter – However, problem difficult using α : false estimation call rate action classical statistical (i,m) β : 1 - randommethods missed flaw rate σ : curvemethods steepness – call It is likely that such also require large data sets (i,m) μ : flaw size median (50% POD) (Very little work performed to date) 30 Addressing Deficiencies (2) • Markov-Chain Monte Carlo (MCMC) offers a flexible method to use sampling to calculate confidence bounds. • Bayesian approach with non-informative priors can be used to – Model function: Logit or Probit – Model form: (Parameters): 2, 3, and 4 parameter models. • Upper Bound = P(random missed call) = • Lower Bound = P(false call rate) = 1.0 database – [loop: j=1:P] feature – skewnessdatabase vector (i,j, kl') feature vector (i,j, kl') [loop: j=1:P] POD 0.5 distributed sensor data distributed (i,j,k,l) sensor data (i,j,k,l) signal processing / signal feature processing extraction/ feature signal classification signal classification raw extraction feature Pdata FC (i,j,k,l) vector (i,j,kl') raw data (i,j,k,l) 0 feature vector (i,j,kl') a50 P database – RMC database damage state – POD ( a ) 1 exp measures(i,j,m) damage state measures(i,j,m) damage decision damage criteria decision criteria damage state measures – â(i,j,m) damage state measures – â(i,j,m) 1 ln a ln 1 lna ln 3 1 exp POD(a) maintenance action maintenanceβ : (i,m) call (i,m) call (i,m) crack length (a) action (i,m) 3 α : false call rate : falseflaw callrate rate 1 - random α missed σ steepness β ::curve 1 - random missed flaw rate μ : flaw size median (50% POD) σ : curve steepness μ : flaw size median (50% POD) 31 Bayesian Approach Prior, “Belief” Physics Based Model Likelihood Posterior • • • • • • p ( y | ) p ( ) p ( | y ) p( y ) Normalizing Constant Prior – Physics based model or expert opinion Normalizing Constant : Useful in model selection Likelihood: forward model and measurement data Posterior: Integration of information from model and experimental data y: data λ : parameter(s) posterior likelihood prior q 32 Bayes Factors for Model Selection Compare two models M2 and M1 Using the Bayes Factor Marginallikelihood( M 2 ) P(y | M 2 ) BF21 Marginallikelihood( M1 ) P(y | M1 ) Candidate models Model 1 Parameter estimation θˆ 1 θˆ 2 θˆ 3 Model comparison P(y | M1) P(y | M2) P(y | M3) Model 2 BF21 BF <1 2log(BF) <0 1~3 0~2 3~20 20~150 >150 2~6 6~10 >10 Model 3 BF32 Strength of evidence Negative (Support M0) Barely worth mentioning Positive Strong Very Strong ―Bayes Factors by Kass and Raftery, 1995 33 Difficult Data Set #1 • NTIAC A9002(3)L 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 What’s going on here? 0.1 0 0 2 4 6 8 10 12 14 16 18 NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997 34 Difficult Data Set #1 PROBABILITY OF DETECTION (%) • Example of using the wrong model. 100 100 90 90 80 80 70 70 60 50 40 30 20 Data Set: Test Object : Aluminum, A9002(3)L 2219 Stringer Stiffened Panels Condition: After Etch Method: Eddy Current, Raster Scan with Tooling Aid 10 60 50 40 30 20 10 0 0 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 ACTUAL CRACK LENGTH - (Inch) NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997 35 Difficult Data Set #1 • 2 parameter logit/probit • Appears to show a90 and a90/95 values 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 â â 0.5 0.6 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 a (mm) 14 16 18 20 0 0 2 4 6 8 10 12 14 16 18 20 a (mm) 36 Difficult Data Set #1 • 3 parameter lower logit/probit • Again, appears as if there are a90 and a90/95 values â 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 â 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 a (mm) 14 16 18 20 0 0 2 4 6 8 10 12 14 16 18 20 a (mm) 37 Difficult Data Set #1 • 3 parameter upper logit/probit 1 1 0.9 0.9 0.8 0.8 0.7 0.7 â 0.6 â 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 14 a (mm) 16 18 20 0 0 2 4 6 8 10 12 a (mm) 14 16 18 20 38 Difficult Data Set #1 • Case study 4 parameter probit intercept slope 4500 4500 4000 4000 3500 3500 3000 3000 lower asymptote upper asymptote 2500 4000 1 0.9 3500 0.8 2000 3000 0.6 2500 1500 2500 0.7 â 2500 2000 2000 2000 0.4 1000 1500 1500 1000 1000 500 500 0 -40 -20 0 0 0.5 1500 0.3 1000 0.2 500 0.1 500 0 20 40 0 0 0.5 0 0 0.70.80.9 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 a (mm) 39 Difficult Data Set #1 • 4 parameter Logit is most likely 1 0.9 0.8 0.7 â 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 a (mm) 40 Difficult Data Set #1 • Summary of Results 2 parameter Logit 2 parameter Probit 3 parameter lower bound Logit 3 parameter lower bound Probit 3 parameter upper bound Logit 3 parameter upper bound Probit 4 parameter Logit 4 parameter Probit intercept -1.6645 -0.8476 -1.8501 -1.0195 -5.4408 -2.788 -13.7647 -9.8542 slope 1.7257 0.9242 1.7485 0.9616 5.5486 2.9377 12.2874 8.674 lower upper 0.0898 0.1098 0.8478 0.8443 0.175 0.8307 0.1864 0.8282 ML a90 a90/95 3.70E-97 9.4148 12.555 9.08E-98 10.0156 13.4993 7.29E-98 1.02E-98 3.27E-93 1.64E-93 7.24E-92 2.49E-92 41 Difficult Data Set #2 • Example of using the wrong model. 100 100 90 90 PROBABILITY OF DETECTION (%) 80 70 60 50 40 Data Set: D8001(3)L Test Object : Aluminum, 80 2219 Stringer Stiffened Panels Condition: As Machined Method: 70 60 50 40 30 30 20 20 10 10 0 0 -0.050.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 What’s going on here? ACTUAL CRACK LENGTH - (Inch) • Note: MH1823 Software Produce Numerous Warnings. NTIAC, Nondestructive Evaluation (NDE) Capabilities Data Book 3rd ed., NTIAC DB-97-02, Nondestructive Testing Information Analysis Center, November 1997 42 Difficult Data Set #2 • 2 parameter logit/probit • Appears to that a90 and a90/95 values exist. 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 â â 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 0.6 0.1 0 2 4 6 8 10 12 14 a (mm) 16 18 20 0 0 2 4 6 8 10 12 14 16 18 20 a (mm) 43 Difficult Data Set #2 • 4 parameter probit • a90 and a90/95 value doesn’t exist 1 0.9 0.8 0.7 0.6 0.5 â 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 a (mm) 7 8 9 10 44 Difficult Data Set #2 • Which model is correct? • Log Marginal Likelihoods and Bayes factors Bayes factor Model type logit probit /2-parameter /2-parameter (logit) (probit) logit/probit 2-parameter –200.16 –201.63 1.47 ——— ——— –203.86 –203.49 –0.37 –3.7 –1.86 –189.30 –189.00 –0.30 10.86 12.63 –188.89 –185.12 –3.76 11.27 16.51 3-parameter lower bound 3-parameter upper bound 4-parameter 45 Small Data Set • A great example where the last procedure fails • Small data sets do not cause any warnings with standard software. 46 Small Data Set • 4 parameter model 1 0.9 0.8 0.7 0.6 â 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 a (inches) 47 Small Data Set • Summary for Small data set Maus 2 parameter Logit 2 parameter Probit 3 parameter lower bound Logit 3 parameter lower bound Probit 3 parameter upper bound Logit 3 parameter upper bound Probit 4 parameter Logit 4 parameter Probit IMTT 2 parameter Logit 2 parameter Probit 3 parameter lower bound Logit 3 parameter lower bound Probit 3 parameter upper bound Logit 3 parameter upper bound Probit 4 parameter Logit 4 parameter Probit intercept slope lower upper ML 27.1517 12.6708 22.6165 10.577 24.0452 11.6731 0.0719 24.5728 12.1283 0.0705 27.6386 12.5028 0.7967 22.0074 9.8899 0.7917 24.8792 11.3063 0.0706 0.7926 23.7871 10.6664 0.0711 0.7781 25.5467 19.667 28.5743 23.2202 24.5688 20.405 25.3354 26.4209 9.8023 7.4983 11.3208 9.2585 9.2608 7.6861 9.9529 11.153 0.1273 0.1391 0.1263 0.1679 0.9055 0.9041 0.9067 0.8884 a90 a90/95 8.87E-04 0.1405 0.1829 2.60E-03 0.1329 0.159 9.67E-06 9.16E-06 2.29E-04 4.87E-05 1.97E-05 3.18E-06 3.30E-03 4.40E-03 2.07E-04 1.58E-04 6.34E-06 3.10E-05 2.17E-05 5.51E-07 0.0941 0.0873 0.1248 0.1139 4.51E+01 8.17E+02 1.52E+02 7.99E+03 48 Conclusion • It sometimes appears (and is desirable) that there is a systematic procedure that will automatically determine the best model, but this actually isn’t the case. • Bayes Factors provide useful approach to evaluate the best model • However, an example with a small data set showed that even the Bayes factor procedure can lead one to a wrong conclusion – It doesn’t tell you to stop and not perform an analysis – Need to look at data and perform ‘diagnostics’ • Bottom line – Procedures don’t replace statisticians. 49 Model-Assisted POD • C-5 Wing Splice Fatigue Crack Specimens: – – – – – – Two layer specimens are 14" long and 2" wide, 0.156" top layer, 0.100" bottom layer 90% fasteners were titanium, 10% fasteners were steel Fatigue cracks position at 6 and 12 o’clock positions Crack length ranged from 0.027" – 0.169“ (2nd layer) vary: location of cracks – at both 1st and 2nd layer 1st layer – corner crack z b a 2nd layer – corner crack x z b • a AFRL/UDRI Acquired Data (Hughes, Dukate, Martin) A1-16C0 4 20 40 60 80 100 120 0.110" 2 0 100 0.107" 200 300 400 A1-16C0 500 600 700 20 40 60 80 100 120 2 0 -2 100 200 300 400 500 600 700 50 MAPOD • • Perform simulated studies: Compare with experimental results Bayesian methods can assist in determining best model. 0.1 0.1 model exp. 0.08 measurement response (V) measurement response (V) 0.08 0.06 0.04 0.02 0 -0.02 model-corner model-through exp. 0.06 0.04 0.02 0 0 0.02 0.04 0.06 0.08 0.1 0.12 crack length (in) 0.14 0.16 0.18 A) 1st layer – faying surface – corner cracks x -0.02 0.2 0 0.02 0.04 z 0.14 x b b 0.08 0.1 0.12 crack length (in) 0.16 0.18 0.2 B) 2nd layer – faying surface – corner / through cracks x z 0.06 x z z b a a a a 51 MAPOD Demonstration of model-assisted probability of detection (MAPOD) Experimental Comparison with Full Model-Assisted 2nd layer – faying surface – corner & through cracks 1 0.9 MAPOD exp. 0.8 POD 0.7 0.6 Successes: • First demonstration of (MAPOD) 0.4 in the literature for structural 0.3 problem. 0.2 experimental POD • Eddy current models were able to 0.1 full model-assisted POD simulate eddy current inspection 0 0 0.05 0.1 0.15 0.2 0.25 of 2nd layer fatigue cracks around crack length (in) fastener holes. Knopp, Aldrin, Lindgren, and Annis, “Investigation of a model-assisted approach to probability of detection evaluation”, Review of Progress in Quantitative Nondestructive Evaluation, (2007) . 0.5 52 Heteroscedasticity • â vs a analysis • Berens, A.P and P.W. Hovey, “Flaw Detection Reliability Criteria, Volume I – Methods and Results,” AFWALTR-84-4022, Air Force Wright Aeronautical Laboratories, Wright-Patterson Air Force Base, April 1984 (â vs a analysis is always more advantageous than hit/miss because much more information is available, but hit/miss is used much more in practice) • Berens, A.P., NDE Reliability Data Analysis, American Society for Metals Handbook Nondestructive Evaluation and Quality Control, Vol 17, pp. 689-701, ASM International, 1989. (classic reference on the subject, still standard today) • MIL-HDBK-1823 (1999) – (Guidance for POD studies based on the methods described by Berens and Hovey) • Box Cox transformations • Kutner, Nachtsheim, Neter, and Li, “Applied Linear Statistical Models”, (2005) 53 Heteroscedasticity • â vs a assumes homoscedasticity, and if that assumption is violated, one must resort to hit/miss analysis. This was the case for an early MAPOD study (Knopp et al. 2007) • Box-Cox transformation can remedy this problem. 0.10 signal response â 0.08 0.06 0.04 0.02 0.00 -0.02 0 1 2 3 crack size (mm) 4 5 54 Heteroscedasticity • Box Cox transformation according to Kutner et al. • Note: Not to be used for nonlinear relations. • Box-Cox identifies transformations from a family of power transformations. • The form is: â â • Some common transformations 2 0.5 0 0 .5 1 .0 â â 2 â â â loge â 1 â 1 â â â 55 Heteroscedasticity • New regression model with power transform: âi 0 1ai i • λ needs to be estimated. Box-Cox uses maximum likelihood. • I use Excel’s Solver to do a numerical search for potential λ values. • Standardize observations so that the magnitude of the error sum of squares does not depend on the value of λ. gi 1 (âi 1), 1 c gi cln(âi , 0 0 c âi 1/ n • c is the geometric mean of the observations • Next step is to regress g on a for a given λ and calculate SSE. 56 Heteroscedasticity • The value of λ that minimizes SSE is the best transformation. • This procedure is only a guide, and a high level of precision is not necessary. • For this data set, λ = 0.45 â transformed lambda = 0.45 0.12 0.10 â + 0.02 0.08 0.06 0.04 0.02 0.00 0 1 2 3 crack size (mm) 4 5 0.42 0.40 0.38 0.36 0.34 0.32 0.30 0.28 0.26 0.24 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0 1 2 3 4 5 crack size (mm) 57 Heteroscedasticity • Box Cox – POD curve associated with λ = 0.45 a90 transform. a90 95 a50 probability of detection, POD | a 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 1 2 3 size, a (mm) 4 58 Heteroscedasticity • Box Cox transformation – square root transform 0.36 0.34 0.32 0.30 0.28 0.26 0.24 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.25 response, â â transformed lambda = 0.5 0.30 0.20 0.15 0.10 0.05 0 1 2 3 crack size (mm) 4 5 0.00 0 1 2 3 4 size, a (mm) 59 Heteroscedasticity • POD result for square-root a90 transform a50 a90 95 probability of detection, POD | a 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 1 2 3 size, a (mm) 4 60 Summary • Box-Cox enables â vs a analysis for data sets where the variance is not constant but has some relationship with the independent variable such as crack size. analysis method 1st order linear λ detection threshold 0.23 false calls 0 a90 (mm) a90/95 (mm) 0.45 left censor 0.13 a90 - a90/95 % 2.176 2.327 6.9% 1st order linear 0.5 0.14 0.195 1 2.102 2.257 7.3% 1st order linear 0.5 0.195 0.195 1 2.269 2.53 11.5% 2nd order linear 0.5 .14 0.195 1 2.277 2.472 8.5% 2nd order linear 0.5 0.195 0.195 1 2.197 2.428 10.5% hit/miss 1 0.187 1 1.72 2.04 18.6% hit/miss 1 0.162 11 1.498 1.907 27.3% difference 61 Physics-Inspired Models • MAPOD – idea is to use simulation to reduce time and cost of POD studies. • Properly integrating simulation and experiment is an enormous task. • Intermediate step is to use models to inspire the functional form of the regression model. 62 Physics-Inspired Models - literature • R.B. Thompson and W.Q. Meeker, “Assessing the POD of Hard-Alpha Inclusions from Field Data”, Review of Progress in QNDE, Vol. 26, AIP, pp 1759-1766, (2007). (Example where kink regression is used to distinguish between Raleigh scattering at small flaw sizes and regular scattering at larger sizes) Figure from http://www.tc.faa.gov/its/worldpac/techrpt/ar0763.pdf 63 Physics-Inspired Models • Simulation and Experiment • Visual inspection reveals that a 2nd order linear model may fit the data better than the standard â vs a analysis. • Evidence beyond visual: 1) p-value for a2 is 0.001 and adjusted R-square value increases slightly with inclusion of a2. 0.10 Simulation Experiment 0.08 quadratic model 0.06 0.04 0.02 0.00 -0.02 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 0.36 0.34 0.32 0.30 0.28 0.26 0.24 0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 1 2 3 4 5 64 Physics-Inspired Models – parallel work • Recent unpublished work by Li, Nakagawa, Larson, and Meeker. http://www.stat.iastate.edu/preprint/articles/2011-05.pdf 65 Summary • Physics model hopefully provides functional form of the response, and this knowledge can be used in the initial DOE for a POD study. • Physics-Inspired model concept is a first step in using physics models for making inference on reliability. • Confidence bounds calculation on models more complicated than â vs a is an open problem, especially transforming to probability of detection curve. 66 Bootstrap Methods • Confidence bound calculations are complicated and only available for hit/miss and â vs a analysis. • More complicated models require new method. • Bootstrap methods are simple and flexible enough to provide confidence bounds for a wide variety of models. 67 Bootstrap Methods - literature • Efron, B., and Tibshirani, R. J., An Introduction to the Bootstrap, Chapman & Hall, New York, NY, 1993. • C.C. McCulloch and J. Murphy, “Local Regression Modeling for Accurate Analysis of Probability of Detection Data”, Mat. Eval., Vol. 60, no. 12, pp. 1438-1143, (2002) (A rare example of bootstrapping used in NDE context) • Amarchinta, Tarpey, and Grandhi, “Probabilistic Confidence Bound Framework for Residual Stress Field Predictions”, 12th AIAA Non-Deterministic Approaches Conference, AIAA-20102519, Orlando, FL, (2010). 68 Bootstrap Methods • Bootstrap procedure is simply to sample with replacement and generate a POD curve each time. • Sort all of the a90 values in ascending order and look at the value in the 95th percentile to determine a90/95 • Example for the previous transformed data set with λ = 0.5 a90 a90/95 Wald Method 2.102 mm 2.257 mm Bootstrap 1,000 2.096 mm 2.281 mm Bootstrap 10,000 2.099 mm 2.299 mm Bootstrap 100,000 2.099 mm 2.297 mm 69 Summary • Bootstrapping is beautiful. • 1,000 samples probably sufficient, but 100,000 isn’t that difficult. • Some interesting formal work could be done to look at the influence of censoring, which is probably beyond the scope of this work. • Results seem to indicate the 2nd order model (which I think is the best) is the most conservative. • Further investigation of censoring planned. analysis method λ left censor detection threshold false calls a90 (mm) a90/95 (mm) 1st order linear 1st order linear 1st order linear 2nd order linear 2nd order linear hit/miss hit/miss 0.45 0.5 0.5 0.5 0.5 1 1 0.13 0.14 0.195 .14 0.195 0.23 0.195 0.195 0.195 0.195 0.187 0.162 0 1 1 1 1 1 11 2.176 2.102 2.269 2.277 2.197 1.72 1.498 2.327 2.257 2.53 2.472 2.428 2.04 1.907 a90 - a90/95 % difference 6.9% 7.3% 11.5% 8.5% 10.5% 18.6% 27.3% 70 Summary • Hit/miss analysis – MCMC • â vs a analysis – unchanged • Higher order / complex models – bootstrapping • Methods presented for putting confidence bounds on are not elegant by any stretch of the imagination, but incredibly robust and useful. • Much work needs to be done via simulation to move these methods into practice. • UQ – Progress made on uncertainty propagation. • UQ – Bayesian calibration techniques being investigated. 71 Efficient Uncertainty Propagation • Deterministic simulations are very time consuming. • NDE problems require stochastic simulation if the models are to truly impact analysis of inspections. • Need modern uncertainty quantification methods to address this problem. 1 2 n Eddy Current NDE Model [Stochastic] ~ Z 72 Efficient Uncertainty Propagation Uncertainty Propagation Methods: • • • • • • • • Monte Carlo Latin Hypercube (Sampling Methods) FORM/SORM Full Factorial Numerical Integration Univariate Dimension Reduction Karhunen–Loève Expansion / ANOVA (High Dimension Problems) Polynomial Chaos Expansion (Intrusive) Probabilistic Collocation Method (Non Intrusive) 73 Uncertainty Propagation • Motivation: Model evaluations are computationally expensive. There is a need for more efficient methods than Monte Carlo Input Parameters with Variation: • Probe dimensions (Liftoff / tilt) • Flaw characteristics (depth, length, shape) 1 2 n X 1 ~ Norm al X n ~ Uniform Eddy Current NDE Model [Deterministic] ~ Z Eddy Current NDE Model [Stochastic] ~ Z ~? • Objective: Efficiently propagate uncertain inputs through “black box” models and predict output probability density functions. (Non-intrusive approach) • Approach: Surrogate models based on Polynomial Chaos Expansions meet this need. 74 Uncertainty Propagation Uncertainty propagation for parametric NDE characterization problems: • Probabilistic Collocation Method (PCM) approximates model response with a polynomial function of the uncertain parameters. • This reduced form model can then be used with Zˆ f (x) cii (x) N i 1 traditional uncertainty analysis approaches, such as Monte Carlo. Extensions of generalized polynomial chaos (gPC) to high-dimensional (2D, 3D) damage characterization problems: • • • Karhunen-Loeve expansion Analysis of variance (ANOVA) Smolyak Sparse Grids Critical Flaw Size 1 Key Damage and Measurement States (e.g. crack length, probe liftoff) >1 Parameterized Flaw Localization and Sizing Full 3D Damage and Material State Characterization >>1 N 75 Uncertainty Propagation and High Dimensional Model Representation Approach (1): Karhunen-Loeve Expansion • • Address stochastic input variable reduction when number of random variables (N) is large. Apply Karhunen-Loeve Expansion to map random variables into a lower-dimensional random space (N'). conductivity map with N random variables Coil Crystallites (Grains) =2.2*106 S/m (x) C (x, x) covariance KarhunenLoéve Expansion Eddy Current Example: model • Correlation function (covariance model) defines random conductivity map, • Set choice of grid length to – achieve model convergence and – eliminate insignificant eigenvalues for reduced order conductivity map. N' random variables 1... N N (x) n n n (x) n 1 reduced order conductivity map with N' random variables 76 Uncertainty Propagation and High Dimensional Model Representation Approach (2): Analysis of Variance (ANOVA) Expansion • • • Provides surrogate to represent high dimensional set of parameters Analogous to ANOVA decomposition in statistics Locally represent model output through expansion at anchor point in -space – Requires inverse problem – Replace random surface with equivalent 'homogeneous' surface (1) Identify conductivity unique map with sources N random of variance variables (x) defined by covariance model C (x, x) KarhunenLoéve Expansion N >> N' N' random variables 1... N N (x) n n n (x) n 1 reduced order conductivity map with N' random variables (2) Identify significant M random variables factors in model 1... M ANOVA Expansion Z (ξ) N' >> M ξ ξ 77 Uncertainty Propagation and High Dimensional Model Representation Approach (2): Analysis of Variance Expansion + Smolyak Sparse Grids • • Significant computational expense for high-dimensional integrals Can leverage sparse grids based on the Smolyak construction [Smolyak, 1963; Xiu, 2010; Gao and Hesthaven, 2010] – Provides weighted solutions at specific nodes and adds them to reduce the amount of necessary solutions – Sparse grid collocation provides subset of full tensor grid for higher dimensional problems – Approach can also be applied to gPC/PCM Sparse Grid and Full Tensor Product Grid 1 0.5 0 -0.5 conductivity (1) Identify unique map with sources N random variables of variance (x) defined by covariance model C (x, x) KarhunenLoéve Expansion N >> N' N' random variables 1... N N (x) n n n (x) n 1 reduced order conductivity map with N' random variables (2) Identify significant M random variables factors in model 1... M -1 -1 -0.5 0 ANOVA Expansion Z (ξ) N' >> M ξ ξ 0.5 1 78 All Models Are Wrong • “All models are wrong, and to suppose that inputs should always be set to their ‘true’ values when these are ‘known’ is to invest the model with too much credibility in practice. Treating a model more pragmatically, as having inputs that we can ‘tweak’ empiracally, can increase is value and predictive power” (Kennedy O’Hagan 2002) • Eddy current liftoff is a particularly great example of this. 79 Bayesian Analysis • Bayesian Model Averaging (BMA) – Used when experts provide competing models for the same system. • Bayesian calibration is the most promising technical option for integrating experimental data and simulation in a rigorous way that accounts for all sources of uncertainty. • Kennedy / O’Hagan paper 2001 inspired many efforts in this direction. BTW, rejected by Journal of the American Statistical Association. Now published in Journal of the Royal Statistical Society: Series B, and has been referenced 620 times. Add that to the number of references to the unpublished technical report that was rejected by JASA, and you get a large number. • Many efforts ongoing in UQ community 80 Bayesian Calibration • What uncertainty needs to be quantified to go from the simulator to reality? – Input – Propagation from input to output (Hopefully done in previous section, but notice no uncertainty is actually quantified in this part) – Code – Discrepancy 81 Bayesian Calibration • Terminology – Model: set of equations that describes some real world phenomena. – Simulator: Executes the model with computer code. – Calibration parameters: θ – Controlled input variables: x 82 Bayesian Calibration • Simulator: y = f(x,θ) • Observations: observations = reality(control variables) + ε, where ε is observation error • Reality doesn’t depend on calibration parameters. • Typically you see: observations = f(x,θ) + ε • This is wrong, mainly because it doesn’t account for uncertainty in θ, and the ε’s are not independent. • Bayesian methods are used to learn about uncertainty in θ. * Paraphrasing discussion with Tony O’Hagan 83 Bayesian Calibration - literature • Kennedy, M. C. and O’Hagan, A., “Bayesian calibration of computer models,” J. R. Statist. Soc. B, Vol. 63, pp. 425–464, (2001). • Park, I., Amarachinta, H. K., and Grandhi, R. V., “A Bayesian approach to quantification of model uncertainty,” Reliability Engineering and System Safety, Vol 95, pp. 777-785, (2010) 84 Summary • Hit/miss analysis – MCMC • â vs a analysis – unchanged • Higher order / complex models – bootstrapping • Methods presented for putting confidence bounds on are not elegant by any stretch of the imagination, but incredibly robust and useful. • Much work needs to be done via simulation to move these methods into practice. • UQ – Progress made on uncertainty propagation. • UQ – Bayesian calibration techniques being investigated. 85