Detection of Precursors to Aviation Safety Incidents due to Human Factors I. Melnyk, P. Yadav, M. Steinbach, J. Srivastava, V. Kumar and A. Banerjee Department of Computer Science & Engineering University of Minnesota, Minneapolis ICDM Workshop on Domain Driven Data Mining December 7, 2013 Overview • Introduction • Related Work • Proposed Approach • HMM vs HSMM • Experimental Results • Conclusion 2 Introduction • Fatal accidents and onboard fatalities 2003-2012 [Boeing Report ’12] 3 Introduction • Estimated two-fold increase in air traffic by 2025 [Sheridan ’06] • • • • Congestion in air and airports Load on pilots and traffic controllers Greater chance to make error Our objective • Detect precursors to aviation safety incidents due to human factors – Analysis and modeling of pilot actions – Data generation and evaluation 4 Related Work: Aviation Safety • Hidden Markov Models [Srivastava ’05] • Observations modeled using N-dim binary vector (switches in cockpit) • Cluster data to get smaller class of observations; build HMM over reduced data • Clustering approach [Budalakoti et al. ’09] • Cluster pilot action sequences using k-medoids based on nLCS • Rank order sequences to identify anomalous sequences • One-class SVM [Das et al. ’10] • Detects anomalies in both continuous and discrete sequences • Employed Multiple Kernel learning: LCS for discrete, SAX for continuous • Dynamic Bayesian Networks [Saada et al. ’12] • Hidden nodes – pilot actions; Observable nodes – aircraft sensors • Detects pilot errors in the past given current instrument data 5 Problem Formulation • Given database of normal pilot action sequences • Actions come from finite alphabet: • Examples: “raise landing gear”, “lower flaps”, “decrease throttle”, etc. • Construct model of a normal sequence from data • Assign anomaly score to a test sequence • Entire sequence is anomalous (offline anomaly detection) • Specific action is anomalous (online anomaly detection) • Examples: “unusual order of actions”, “forgotten action”, etc. 6 Analysis of Pilot Actions • Flight phases • Example: landing phase pilot actions descent touch down braking on runway Action ID stage duration time btw actions Null action Time 7 Analysis of Pilot Actions • Flight phases • Example: landing phase pilot actions • Simplification: ignore time duration between actions descent touch down braking on runway Action ID stage duration Time 8 Hidden Markov Model (HMM) • • • Hidden states • Stages of aircraft operation • Examples: initial descent, touch down, braking, etc. Observations • Pilot actions • Example: initial descent – reduce throttle, lower flaps, lower landing gear, etc. Model parameters • Prob. distributions: Transition , observation , prior • Drawback • Geometric state-duration distribution – encourages fast state switching • Inability to model arbitrary state durations 9 Hidden Semi-Markov Model (HSMM) • Additional hidden variable • State duration • Forces hidden state • to last time steps Model parameters • Probability distributions: • Duration • Transition • Observation , initial distributions 10 , HSMM: Model Parameters Estimation • Estimate probability distributions (conditional probability tables) • Duration • Transition • Observation • Initial distributions • Use database of normal pilot action sequences • Select parameters which maximize likelihood of data • Non-convex problem without closed-form solution • Use Expectation Maximization (EM) [Dempster et al. ’77] • Similar to Baum-Welch algorithm for HMM [Baum et al. ’70] 11 Anomaly Detection Methodology • Detect if a test sequence is anomalous • Entire sequence is anomalous (offline anomaly detection) • Normalized joint log likelihood • Specific action is anomalous (online anomaly detection) • • Conditional probability Computational complexity • Computation uses Junction Tree algorithm for inference - sequence length - number of hidden states • Cost: • For comparison, complexity for HMM: 12 - maximum state duration Results: Synthetic Data • Compared HMM and HSMM to detect duration anomalies • Data: Normal Anomalous Training 200 0 Testing 25 25 13 Flight Simulator • FlightGear flight simulator Figure 4: Landing of aircraft in FlightGear simulator. • P ilo t’s A c tio n s S e q u e n c e • Landing flight phase • Cessna 172 Skyhawk landing at Half Moon Bay, CA airport $1 *( 2 0 Simulator setup • Aircraft controlled using keyboard • • 4 0 6 0 8 0 Keystrokes interpreted as pilot actions 12 commands to control aircraft 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0 2 0 0 T im e s ta m p !" #$%&' (4' +#' , -' ( 14 Pilot Actions for Landing Figure 4: Landing of aircraft in FlightGear simulator. P ilo t’s A c tio n s S e q u e n c e 1 2 A c tio n ID 1 0 5, 6-(4$1 *( 8 6 4 2 0 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0 2 0 0 T im e s ta m p !" #$%&' ()*+#' , -' ( !" #$%&' (4' +#' , -' ( -!, . )&)/, 0$*(1 )!" ( , )&' #$*-2' &' 3, !$#( 4' -+' *4(7-)*8(((((( , )&' #$*-2' &' 3, !$#2#744' #( , 66&)' 4(.brakes; #' , 9-:( -!, . )&)/, 0$*(1 )!" (#744' #( e 5: Sequence of pilot’s actions under normal operating conditions while landing Cessna aircraft 15 tGear simulator. Flight Simulator Data • Generated Data Normal Anomalous Type # of sequences 110 1 2 3 4 5 10 10 10 10 10 • Types of anomalies • 1 – Throttle kept constant, flaps are not lowered; rest is normal • 2 – No initial throttle increase; rest is normal • 3 – Flaps are not lowered; rest is normal • 4 – At the end of flight brakes are not applied; rest is normal • 5 – Pilot overshoots runway and lands behind it; rest is normal 16 AUC Results: HSMM vs HMM • AUC based on 11-fold cross-validation • 110 normal sequences split into 11 parts • 10 parts used for training • 1 part + 50 anomalous sequences used for testing • Initialization was fixed across runs • Performance metric: area under ROC curve (AUC) 17 Effect of Initialization: HSMM vs HMM • Dependency of AUC on initialization • Selected 10 random model initializations • Split of dataset was fixed across runs • • Training: 100 normal sequences • Testing: 10 normal + 50 anomalous sequences Performance metric: area under ROC curve (AUC) 18 Offline vs Online Anomaly Detection Anomaly scoring: Anomaly scoring: • Detected anomalies in streaming data 1: No initial throttle increase 2: Incorrect usage of rudder 3: Mistakenly used elevator control after touch down 19 HSMM vs Other Methods • • Baseline methods [Chandola et al. ’08] • EFSA: Extended Finite State Automata • t-STIDE: Threshold-based sequence time delay embedding • WIND: Window-based anomaly detection Setup • Training: 100 normal sequences; Testing: 60 sequences • Sliding window length (history length) was varied from 1 to 20 • Selected model with best AUC on training set; Evaluated on test set Type 1 Type 2 Type 3 Type 4 Type 5 HSMM 1.00 0.97 0.77 0.88 1.00 HMM 0.87 0.60 0.71 0.95 0.99 WIND 0.9 0.60 0.70 0.87 1.00 EFSA 0.85 0.67 0.67 0.91 0.98 t-STIDE 0.84 0.67 0.68 0.92 1.00 20 Summary • Proposed framework to model discrete pilot actions • HMM • Hidden states – stages of aircraft operation • Observations – pilot actions • Drawback – inability to model arbitrary state durations • HSMM • • Introduces additional hidden variable to model state durations Evaluated model performance • Synthetic data • Flight simulator data • Compared HSMM to HMM and other anomaly detection algorithms Thank you! 21