Process Control for Risk Identification and Anomaly Detection Ben Schulte and James H. Lambert*, *corresponding author, lambert@virginia.edu; PO Box 400747; Center for Risk Management of Engineering Systems and Department of Systems and Information Engineering, University of Virginia; Charlottesville, VA 22904; (434)982-2072; fax (434)924-0865 Abstract Recently, the Department of Homeland Security streamlined or created Information Sharing & Analysis Centers (ISAC) in order to minimize our nations’ critical infrastructure systems’ vulnerability to a dehabilitating terrorist attack. The fourteen ISACs monitor threats to our food and water supply, our nation’s economic backbone, and to core industrial infrastructure networks like telecom and energy supply. Currently, more ISACs are being developed while the DHS encourages more and more industry participants. In short, ISACs are a major priority of the DHS. As the number of centers and corporate and public partners increase, the amount analysis grows exponentially. Lambert and Sarda proposed this influx of data be monitored in real-time by employing statistical process control. Instead of the traditional venue for SPC, a manufacturing line, they modeled an infrastructure network as a system of components and proposed one could measure the interactions of these components over time. Similarly to SPC for manufacturing, their paper argued a monitor could set upper and lower thresholds for acceptable interaction rates within the monitored system. When those thresholds are exceeded in a manufacturing environment, the process stops and the offending cause is pinpointed and fixed. A similar breach of these thresholds in our infrastructure SPC could pinpoint and alert authorities of a potential terrorist threat or abnormal event. When setting up SPC in the manufacturing environment, choosing your process variables is a critical design aspect. This paper’s focus extends Sarda’s process variables proposals, interaction entropy and interaction informativity, while proposing new process variables that measure indirect interactions between system components. These process variables are then clarified by analyzing several annual reports of disturbances, load reductions, and unusual occurrences on the bulk electric systems of the electric utilities in North America. We plot the process variables against this database with a software tool we explicitly developed for this purpose. Key words Risk identification, scenario analysis, statistical process control, failure modes and effects analysis, information entropy, systems engineering, anomalies detection Introduction Relevant literature Visual inspection in industrial manufacturing Konig, A.; Windirsch, P.; Gasteier, M.; Glesner, M.; Micro, IEEE , Volume: 15 , Issue: 3 , June 1995 Pages:26 - 31 20 [Abstract] [PDF Full-Text (592 KB)] IEEE JNL Maybe some manufacturing leads. – B Robustness of the Markov-chain model for cyber-attack detection Nong Ye; Yebin Zhang; Borror, C.M.; Reliability, IEEE Transactions on , Volume: 53 , Issue: 1 , March 2004 Pages:116 - 123 16 [Abstract] [PDF Full-Text (304 KB)] IEEE JNL Overall, this study provides some support for the idea that the Markov-chain technique might not be as robust as the other intrusion-detection such as the chi-square distance test technique [35],? Network and service anomaly detection in multi-service transaction-based electronic commerce wide area networks Ho, L.; Papavassiliou, S.; Computers and Communications, 2000. Proceedings. ISCC 2000. Fifth IEEE Symposium on , 3-6 July 2000 Pages:291 - 296 3 [Abstract] [PDF Full-Text (648 KB)] IEEE CNF An unsupervised anomaly detection patterns learning algorithm Yingjie Yang; Fanyuan Ma; Communication Technology Proceedings, 2003. ICCT 2003. International Conference on , Volume: 1 , 9-11 April 2003 Pages:400 - 402 vol.1 9 [Abstract] [PDF Full-Text (275 KB)] IEEE CNF Combining negative selection and classification techniques for anomaly detection Gonzalez, F.; Dasgupta, D.; Kozma, R.; Evolutionary Computation, 2002. CEC '02. Proceedings of the 2002 Congress on , Volume: 1 , 12-17 May 2002 Pages:705 - 710 14 [Abstract] [PDF Full-Text (639 KB)] IEEE CNF Fuzzy clustering for intrusion detection Shah, H.; Undercoffer, J.; Joshi, A.; Fuzzy Systems, 2003. FUZZ '03. The 12th IEEE International Conference on , Volume: 2 , 25-28 May 2003 Pages:1274 - 1278 vol.2 1 [Abstract] [PDF Full-Text (463 KB)] IEEE CNF 1 A formal framework for positive and negative detection schemes Esponda, F.; Forrest, S.; Helman, P.; Systems, Man and Cybernetics, Part B, IEEE Transactions on , Volume: 34 , Issue: 1 , Feb. 2004 Pages:357 - 373 [Abstract] [PDF Full-Text (480 KB)] IEEE JNL The SRI IDES statistical anomaly detector Javitz, H.S.; Valdes, A.; Research in Security and Privacy, 1991. Proceedings., 1991 IEEE Computer Society Symposium on , 20-22 May 1991 Pages:316 - 326 1 [Abstract] [PDF Full-Text (960 KB)] IEEE CNF Entropy: a new definition and its applications Pal, N.R.; Pal, S.K.; Systems, Man and Cybernetics, IEEE Transactions on , Volume: 21 , Issue: 5 , Sept.1 Oct. 1991 Pages:1260 - 1270 [Abstract] [PDF Full-Text (976 KB)] IEEE JNL On The Quantitative Definition of Risk Kaplan, Stanley; Garrick, B.J. Society for Risk Analysis, Volume:1 Issue: 1, 1981 Pages 11-27 Entropies as measures of software information Abd-El-Hafiz, S.K.; Software Maintenance, 2001. Proceedings. IEEE International Conference on , 7-9 Nov. 2001 Pages:110 – 117 [Abstract] [PDF Full-Text (70 KB)] IEEE CNF 1 Method Our analysis begins by dissecting an event log for a complex system. Example Conclusions and Future Work measures : 1. direct interactions/week 2. indirect interactions/week 3. direct/indirect 4. % of system components affected 5. toggle switch – when does scenario enter into “Red” zone of heightened danger 6. how does that toggle switch our analysis from my presentation: 1. We need to address a series of questions that focus on the what are scale or universe of analysis is. - Have I considered using a HHM to focus the analysis to a smaller subsection of possible events? - What is the universe of components? Start with a complete system and work your way backwards. 2. Greg brought up an interesting idea in terms of final analysis, as we take away resources from preventing or addressing one type of interaction how soon do we see that incident occur? i.e. How does the reallocation of resources affect our frequency plots? 3. Get familiar with 6 sigma analysis from Pr. Haimes’ book (in the extreme event analysis chapter) 4. Potential sources – Paul Jiang’s dissertation (“Reliance” something), Montgomery (out of control, run rules, etc.), Design of Experiments (Taguchi) Acknowledgements References Vitae