System Theoretic Approach To Cybersecurity Dr. Qi Van Eikema Hommes Lecturer and Research Affiliate Hamid Salim Stuart Madnick Professor IC3.mit.edu 1 Research Motivations Cyber to Physical Risks with Major Consequences Source: Hitachi Presentation Outline • Research Motivations • Approaches – System‐Theoretic Accident Model and Processes (STAMP) – Causal Analysis based on STAMP (CAST) – System Theoretic Process Analysis (STPA) • Case Study – CAST Applied to the TJX Case • Future Research Directions 3 System Theoretic Accident Process and Modeling (STAMP) Controller Model of controlled Process Control Actions Feedback Controlled Process 4 A Generic Control Structure 5 The Approaches • The System Theoretic Model: STAMP • Looking forward: System Theoretic Process Analysis (STPA) • Looking backwards: Causal Analysis using System Theory (CAST) 6 STPA Process Safety or Security Problem to Prevent Hazard Inadequate Control Actions Causes Design and Management Requirements and Controls 7 CAST Process 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximate events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 8 Presentation Outline • Research Motivations • Approach – System‐Theoretic Accident Model and Processes (STAMP) – Causal Analysis based on STAMP (CAST) • Case Study – CAST Applied to the TJX Case • Future Research Directions 9 TJX (TJ Maxx & Marshalls) Case Study • TJX is a US-based major off-price retailer. – Revenues > $25 billion (FY2014) • Victim of largest (by number of cards) cyber-attack in history, when announced in 2007. • Cost to TJX > $170 million, per SEC filings. • Cyber-attack launched from a store on Miami, FL in 2005 by exploiting Wi-Fi vulnerability. • Hackers ultimately reached corporate payment servers and stole current transaction data. • Cyber-attack lasted for over 1.5 years Sources: Federal/State Court records (primary), TJX SEC Filings, Others (NYT, WSJ, Globe, FTC, Academic papers, Journal articles). 10 CAST Step 1: Identify System and Hazards 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. • System – TJX payment card processing and management system • Hazards – at system level – System allows for unauthorized access to customer information 11 CAST Step 2: Define System Security Requirements • Protect customer information from unauthorized access. • Provide adequate training to staff for managing security technology infrastructure. • Minimize losses from unauthorized access to payment system. 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 12 CAST Step 3: Hierarchical Control Structure 13 Proximal Event Chain 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 14 Breaching Marshalls Store 1. AP- Open authentication vs Shared Key authentication. 2. WEP publically known weak algorithm compromised. 3. Sniffers used to monitor data packets. 4. Hackers steal store employee account information and gain access to TJX corporate servers. 15 Hackers Establish VPN Connectivity 1. Hackers use Marshalls AP to install VPN connection. 2. VPN is between TJX corporate server and hacker controlled servers in Latvia. 3. Code installed on TJX corporate payment processing server. 16 Flow for Sales of Stolen Payment Card Information. • Via Bank in Latvia 17 Proximal Event Chain 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 18 CAST Step 5: Analyzing the Physical Process (TJX Retail Store) 19 CAST Step 5: Analyzing the Physical Process (TJX Retail Store) • Safety Requirements and Constraints • Emergency and Safety Equipment • Failures and Inadequate of the Above Equipment • Physical Contextual Factors 20 CAST Step 5: Analyzing the Physical Process (TJX Retail Store) • Safety Requirements and Constraints • Prevent unauthorized access to customer information. • Emergency and Safety Equipment • Wi‐Fi network Access Point (AP) authentication • Wi‐Fi encryption algorithm 21 CAST Step 5: Analyzing the Physical • Failures and Inadequacy Process (TJX Retail Store) • Retail store Wi‐Fi AP misconfigured • Inadequate encryption technology – WEP decrypting key were freely available on the internet. • Inadequate monitoring of data activities on the Wi‐Fi . • Physical Contextual Factors • Early adopter of Wi‐Fi • Learning curve and training 22 Proximal Event Chain 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 23 Step 6: Analysis of Higher Levels of the Hierarchical Safety Control Structure • Safety Requirements and Constraints • Emergency and Safety Equipment • Failures and Inadequate of the Above Equipment • Physical Contextual Factors 24 Step 6: Analysis of Higher Levels of the Hierarchical Safety Control Structure • Safety Requirements and Constraints • • Prevent unauthorized access to customer information. Emergency and Safety Equipment – Payment card data is encrypted during transmission and storage – Conform to Payment Card Industry Data Security Standard (PCI‐DSS) 25 Step 6: Analysis of Higher Levels of the Hierarchical • Safety Control Structure • Failures and Inadequacy – Payment data briefly stored and then transmitted unencrypted to the bank. – Not compliant with PCI‐DSS. – Fifth Third Bancorp had limited influence on TJX Physical Contextual Factors – PCI‐DSS is not legally required by States (except for NV) and Federal Government. – Fifth Third Bancorp has no regulatory role 26 Step 6: Analysis of Higher Levels of the Hierarchical Safety Control Structure 27 Step 6: Analysis of Higher Levels of the Hierarchical Safety Control Structure State Legislature • • PCI‐DSS is a law in the State of Nevada, but not in Massachusetts where TJX is headquartered. TJX creates jobs and generate revenue in Massachusetts. Legislature may be reluctant to impose constraints. 28 Step 6: Analysis of Higher Levels of the Hierarchical Safety Control Structure 29 Step 6: Analysis of Higher Levels of the Hierarchical Safety Control Structure Federal Regulatory agency: • Most Cyber Security standards are voluntary and are written broadly. • At the time of the attack, no regulation existed for the overall retail industry. 30 Proximal Event Chain 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 31 Step 7: Coordination and Communication Lack of coordination for PCI‐DSS Compliance 32 Step 7: Coordination and Communication Aware of PCI‐DSS compliance issue. 33 Step 7: Coordination and Communication Cyber Security spending was not the highest priority. Aware of PCI‐DSS compliance issue. 34 Step 7: Coordination and Communication Missing support 35 Step 7: Coordination and Communication Missing support Uninformed 36 Step 7: Coordination and Communication No single person responsible for cyber security 37 Proximal Event Chain 1 System and hazard definition 2 System level safety/security requirements 3 Draw control structure 4 Proximal events 5 Analyze the physical system 6 Moving up the levels of the control structure 7 Coordination and communication 8 Dynamics and change over time 9 Generate recommendations. 38 CAST Step 8: Dynamics and Migration to a High‐ Risk State • Initially cyber security risk was low because vulnerabilities were unknown to everyone – experts, businesses, and hackers. • Flaws in managerial decision making process. – Information availability: recent experiences strongly influence the decision (i.e., no break‐ins so far.) 39 CAST Step 8: Dynamics and Migration to a High‐Risk State (Cont.) “My understanding is that we can be PCI‐compliant without the planned FY07 upgrade to WPA technology for encryption because most of our stores do not have WPA capability without some changes. WPA is clearly best practice and may ultimately become a requirement for PCI compliance sometime in the future. I think we have an opportunity to defer some spending from FY07’s budget by removing the money for the WPA upgrade, but would want us all to agree that the risks are small or negligible.” – TJX CIO, Nov. 2005 • Above is a message from CIO in November 2005 to his staff, requesting agreement on his belief that cyber security risk is low. • There were only two opposing views, a majority of his staff agreed. • This confirmation trap led to postponing upgrades. 40 Comparison of Results from FTC and CTC Investigations and STAMP/CAST Analysis No. Recommendation CPC FTC STAMP/CAST 1 Create an executive level role for managing No * Yes cyber security risks. 2 PCI-DSS integration with TJX processes. No No Yes 3 Develop a safety culture. No No Yes 4 Understand limitations of PCI-DSS and No No Yes standards in general. 5 Review system architecture. No No Yes 6 Upgrade encryption technology. Yes No Yes 7 Implement vigorous monitoring of systems. Yes No Yes 8 Implement information security program. No Yes Yes CPC = Canadian Privacy Commission FTC = Federal Trade Commission * = Indicates recommendations that are close to STAMP/CAST based analysis but also has differences. 41 Research Contributions 1. Highlighted need for system thinking and systems engineering approach to cyber security. 2. Tested STAMP/CAST as a new approach for managing cyber security risks. 3. Discovered new insights when applying STAMP/CAST to the TJX case. 4. Recommendations provide a basis for preventing similar events in the future. 42 Application to Cyber Physical System (Stuxnet Example) 43 Application to Cyber Physical System (Stuxnet Example) Unauthenticated command is allowed from any source. 44 Application to Cyber Physical System (Stuxnet Example) Tempered feedback sensor data 45 Application to Cyber Physical System (Stuxnet Example) Tempered Algorithm 46 Future Research Directions • Continue applying CAST for Cyber Security attack analysis and generate comprehensive list of recommendations that include: • Improvements to mitigate technology vulnerabilities • Ways to address systemic issues related to management, decision making, culture, policy and regulation. • Apply the System Theoretic Process Analysis (STPA) approach to identify system vulnerability prior to an attack. – Identify leading indicators – The US Air Force had a successful example and is implementing STPA as a cyber security measure. – Compatible with NIST standard on cyber security 47 Next Steps • (IC)3 is starting a project to ensure the cyber security of complex power systems. • Other project ideas? Http://ic3.mit.edu 48 Questions? Qi Van Eikema Hommes qhommes@mit.edu 49 Backups 50 Research Motivations • Increased cyber intrusions and attacks “Our daily life, economic vitality, and national security depend on a stable, safe, and resilient cyberspace. We rely on this vast array of networks to communicate and travel, power our homes, run our economy, and provide government services. Yet cyber intrusions and attacks have increased dramatically over the last decade, exposing sensitive personal and business information, disrupting critical operations, and imposing high costs on the economy.” ‐‐U.S. Department of Homeland Security • Study Cybersecurity as a complex sociotechnical system problem. • We want to prevent, not react to cyber attacks. 51 System Theoretic Accident Causality Model • STAMP: System Theoretic Accident Modeling Process – Professor Nancy Leveson: Engineering a Safer World, MIT Press 2012. • System Theory: – Hierarchy and emergence – Communication and control • STAMP models: – the effects of complex system interactions – The role of human actions and decisions as a part of the whole system 52 CAST Step 4: Proximate Event Chain 1. In 2005 TJX decided not to upgrade to a stronger encryption algorithm and continued using deprecated WEP encryption. 2. In 2005, hackers use war‐driving method to discover a misconfigured Access Point (AP) at a Marshalls store in Miami, FL. 3. Hackers join the store network and start monitoring data traffic. 4. In 2005, they exploited inherent encryption algorithm weaknesses at the store, and decrypted the key to steal employee account and password. 5. Using stolen account information, hackers accessed corporate payment card processing servers in Framingham, MA. 6. In late 2005 hackers downloaded customer payment card data from TJX corporate transaction processing servers in Framingham, MA using Marshalls store connection in Florida. 7. In 2006 hackers discover vulnerability, that TJX was processing and transmitting payment card transactions without encryption. 53 CAST Step 4: Proximate Event Chain (Cont.) 8. In 2006 hackers installed a script on TJX corporate servers to capture unencrypted payment card data. 9. In 2006 hackers used TJX corporate servers as staging area and create files containing customer payment card data and started downloading files using Marshalls store network. 10. In late 2006 hackers installed a dedicated VPN connection between TJX server in Framingham, MA and a server in Latvia. 11. In 2006 hackers started moving files directly from TJX server to the Latvian server. 12. In December 2006, TJX was alerted by a credit card company of possible data breach of TJX systems, initiating an investigation. 13. In January 2007, TJX announced publically that it was a victim of a cyber‐ attack. 54 CAST Step 5: Analyzing the Physical Process (TJX Retail Store) (Cont.) • Safety Requirements and Constraints: – Prevent unauthorized access to customer information. • Emergency and Safety Equipment (Controls): – Wi‐Fi network Access Point (AP) authentication – Wi‐Fi encryption algorithm – Use of account id/password • Failures and Inadequate Controls: – Retail store Wi‐Fi AP misconfigured and allowed unauthenticated access. – Inadequate monitoring of data activities on the retail store Wi‐Fi . – Inadequate encryption technology – WEP decrypting key were freely available on the internet. – TJX collecting customer information that was not required • Physical Contextual Factors: – TJX was an early adopter of first generation Wi‐Fi technology at its over 1200 retail stores in 2000 – Requiring a significant learning curve, training, and a new knowledge base in a short span of time. 55 CAST Step 8: Dynamics and Migration to a High‐Risk State (Cont.) “My understanding is that we can be PCI‐compliant without the planned FY07 upgrade to WPA technology for encryption because most of our stores do not have WPA capability without some changes. WPA is clearly best practice and may ultimately become a requirement for PCI compliance sometime in the future. I think we have an opportunity to defer some spending from FY07’s budget by removing the money for the WPA upgrade, but would want us all to agree that the risks are small or negligible.” – TJX CIO, Nov. 2005 • Above is a message from CIO in November 2005 to his staff, requesting agreement on his belief that cyber security risk is low. • There were only two opposing views, a majority of his staff agreed. • This confirmation trap led to postponing upgrades. 56 CAST Step 9: Recommendations 1. According to PCI Security Standards Council, compliance is a business issue requiring management attention and need to integrate PCI-DSS requirements within appropriate components on development and operations parts of the control structure. a. Doing so would not ensure full protection against a cyber-attack, but it will help manage the risk more effectively. b. Ensure that TJX is shielded from liability, because TJX was fined $880,000* by VISA for non-compliance plus another $41 million 2. Understand objectives of standards and align them with cyber security and business needs, but PCI-DSS not fully adequate. a. Data must be encrypted when sent over a public network, but not when transmitted within TJX, over intranet or behind a firewall. b. PCI-DSS did not mandate using stronger encryption WPA until 2006, even though WPA was available in 2003. 57 CAST Step 9: Recommendations (Cont.) 3. Building a safety culture at TJX Specific steps can include: a. Safety critical entities can include encryption technology, hardware components (AP, servers, etc.), data retention/disposal/archival policies, a list of Key Threat Indicators (KTI)* to include in monitoring metric, and prevailing cyber security trends. b. Implement a plan to manage these entities with periodic reviews to update the list of safety critical entities. c. A dedicated executive role with cyber security responsibilities, will allow for a consistent view of TJX security technology across the organization. * KTI can be network traffic beyond an established threshold at TJX stores, number of network connections at odd hours of the day, etc. 58