Engineering a Safer and More Secure World

advertisement
Engineering a Safer and
More Secure World
Prof. Nancy Leveson
Aeronautics and Astronautics Dept.
MIT (Massachusetts Institute of Technology)
Problem Statement
• Current flight-critical systems remarkably safe due to
–
–
–
–
Conservative adoption of new technologies
Careful introduction of automation to augment human capabilities
Reliance on experience and learning from the past
Extensive decoupling of system components
• NextGen and new avionics systems violate these
assumptions.
– Increased coupling and inter-connectivity among airborne, ground,
and satellite systems
– Control shifting from ground to aircraft and shared responsibilities
– Use of new technologies with little prior experience in this
environment
– Reliance on software increasing and allowing greater system
complexity
– Human assuming more supervisory roles over automation,
requiring more cognitively complex human decision making
Hypothesis
• Complexity is reaching a new level (tipping point)
– Old approaches becoming less effective
– New causes of accidents appearing
• Need a paradigm change
Change focus from
Component reliability (reductionism)
Systems thinking (holistic)
• Does it work? Evaluations and experiences so far show it
works much better than what we are doing today.
3
Accident with No Component Failures
• Mars Polar Lander
– Have to slow down spacecraft to land safely
– Use Martian atmosphere, parachute, descent engines
(controlled by software)
– Software knows landed because of sensitive sensors on
landing legs. Cut off engines when determine have landed.
– But “noise” (false signals) by sensors generated when
parachute opens. Not in software requirements.
– Software not supposed to be operating at that time but
software engineers decided to start early to even out load
on processor
– Software thought spacecraft had landed and shut down
descent engines
Types of Accidents
• Component Failure Accidents
– Single or multiple component failures
– Usually assume random failure
• Component Interaction Accidents
– Arise in interactions among components
– Related to interactive and dynamic complexity
– Behavior can no longer be
•
•
•
•
Planned
Understood
Anticipated
Guarded against
– Exacerbated by introduction of computers and software
Software-Related Accidents
• Are usually caused by flawed requirements
– Incomplete or wrong assumptions about operation of
controlled system or required operation of computer
– Unhandled controlled-system states and environmental
conditions
• Merely trying to get the software “correct” or to make it
reliable (satisfy its requirements) will not make it safer
under these conditions.
Confusing Safety and Reliability
Scenarios
Involving failures
Unsafe
scenarios
A
C
B
Unreliable but not unsafe
Unsafe but not unreliable
Unreliable and unsafe
Preventing Component or Functional
Failures is NOT Enough
Traditional Ways to Cope with Complexity
1. Analytic Reduction
2. Statistics
Analytic Reduction
•
Divide system into distinct parts for analysis
Physical aspects  Separate physical components or functions
Behavior
 Events over time
•
Examine parts separately
•
Assumes such separation does not distort phenomenon
– Each component or subsystem operates independently
– Analysis results not distorted when consider components
separately
– Components act the same when examined singly as when
playing their part in the whole
– Events not subject to feedback loops and non-linear interactions
Chain-of-Events Accident Causality Model
• Explains accidents in terms of multiple events,
sequenced as a forward chain over time.
– Simple, direct relationship between events in chain
• Events almost always involve component failure, human
error, or energy-related event
• Forms the basis for most safety engineering and
reliability engineering analysis:
e,g, FTA, PRA, FMEA/FMECA, Event Trees, etc.
and design:
e.g., redundancy, overdesign, safety margins, ….
Human factors
concentrates on the
“screen out”
Engineering
concentrates on the
“screen in”
Not enough attention on integrated
system as a whole
Analytic Reduction does not Handle
• Component interaction accidents
• Systemic factors (affecting all components and barriers)
• Software
• Human behavior (in a non-superficial way)
• System design errors
• Indirect or non-linear interactions and complexity
• Migration of systems toward greater risk over time
Summary
• The world of engineering is changing.
• If safety and security do not change with it, it will become
more and more irrelevant.
• Trying to shoehorn new technology and new levels of
complexity into old methods does not work
Systems Theory
• Developed for systems that are
– Too complex for complete analysis
• Separation into (interacting) subsystems distorts the results
• The most important properties are emergent
– Too organized for statistics
• Too much underlying structure that distorts the statistics
• Developed for biology (von Bertalanffy) and engineering
(Norbert Weiner)
• Basis of system engineering and System Safety
– ICBM systems of 1950s/1960s
– MIL-STD-882
Systems Theory (2)
•
Focuses on systems taken as a whole, not on parts
taken separately
•
Emergent properties
–
Some properties can only be treated adequately in their
entirety, taking into account all social and technical aspects
“The whole is greater than the sum of the parts”
–
These properties arise from relationships among the parts of
the system
How they interact and fit together
Emergent properties
(arise from complex interactions)
Process
Process components interact in
direct and indirect ways
Safety and Security are emergent properties
Controller
Controlling emergent properties
(e.g., enforcing safety constraints)
Individual component behavior
Component interactions
Control Actions
Feedback
Process
Process components interact in
direct and indirect ways
Controller
Air Traffic Control:
Controlling emergent properties
Safety
(e.g., enforcing safety constraints)
Individual component behavior
Component interactions
Control Actions
Feedback
Process
Process components interact in
direct and indirect ways
Throughput
Controls/Controllers Enforce Safety Constraints
• Power must never be on when access door open
• Two aircraft must not violate minimum separation
• Aircraft must maintain sufficient lift to remain airborne
• Bomb must not detonate without positive action by
authorized person
Role of Process Models in Control
• Controllers use a process model to
determine control actions
Controller
Control
Algorithm
Process
Model
Control
Actions
• Accidents often occur when the
process model is incorrect
– How could this happen?
Feedback
Controlled Process
• Four types of unsafe control actions:
• Control commands required for safety
are not given
• Unsafe ones are given
• Potentially safe commands given too
early, too late
• Control stops too soon or applied too
long
21
(Leveson, 2003); (Leveson, 2011)
STAMP (System-Theoretic Accident
Model and Processes)
• Defines safety as a control problem (vs. failure problem)
• Works on very complex systems
• Includes software, humans, new technology
• Based on systems theory and systems engineering
– Design safety (and security) into the system from the
beginning
– Expands the traditional model of the cause of losses
Safety as a Dynamic Control Problem
• Events result from lack of enforcement of safety constraints in
system design and operations
• Goal is to control the behavior of the components and systems
as a whole to ensure safety constraints are enforced in the
operating system
• A change in emphasis:
“prevent failures”
“enforce safety/security constraints on system behavior”
Changes to Analysis Goals
• Hazard/security analysis:
– Ways that safety/security constraints might not be
enforced
(vs. chains of failure events/threats leading to accident)
• Accident Analysis (investigation)
– Why safety control structure was not adequate to prevent
loss
(vs. what failures led to loss and who responsible)
Systems Thinking
Processes
System Engineering
(e.g., Specification,
Safety-Guided Design,
Design Principles)
Risk Management
Management Principles/
Organizational Design
Operations
Regulation
Tools
Accident/Event Analysis
Hazard Analysis
Specification Tools
CAST
STPA
SpecTRM
Organizational/Cultural
Risk Analysis
Identifying Leading
Indicators
Security Analysis
STPA-Sec
STAMP: Theoretical Causality Model
STPA (System-Theoretic Process Analysis)
• A top-down, system engineering technique
• Identifies safety constraints (system and component safety
requirements)
• Identifies scenarios leading to violation of safety constraints;
use results to design or redesign system to be safer
• Can be used on technical design and organizational design
• Supports a safety-driven design process where
– Hazard analysis influences and shapes early design
decisions
– Hazard analysis iterated and refined as design evolves
Aircraft Braking System
System Hazard H4: An aircraft on the ground comes too close to
moving or stationary objects or inadvertently leaves the taxiway.
Deceleration-related hazards:
• H4-1: Inadequate aircraft deceleration upon landing, rejected
takeoff, or taxiing
• H4-2: Deceleration after the V1 point during takeoff
• H4-3: Aircraft motion when aircraft is parked
• H4-4: Unintentional aircraft directional control (differential braking)
• H4-5: Aircraft maneuvers out of safe regions (taxiway, runways,
terminal gates, ramps, etc.
• H4-6: Main gear rotation is not stopped when (continues after) the
gear is retracted.
Example System Requirements Generated
• SC1: Forward motion must be retarded within TBD seconds if
a braking command upon landing, rejected takeoff, or taxiing
• SC2: The aircraft must not decelerate after V1
• SC3: Uncommanded movement must not occur when the
aircraft is parked
• SC4: Differential braking must not lead to loss of or
unintended aircraft directional control
• SC5: Aircraft must not unintentionally maneuver out of safe
regions (taxiways, runways, terminal gates and ramps, etc.)
• SC6: Main gear rotation must stop when the gear is retracted
Safety
Control
Structure
Generate More Detailed Requirements
• Identify causal scenarios for how requirements could be
violated (using safety control model)
• Determine how to eliminate or mitigate the causes
– Additional crew requirements and procedures
– Additional feedback
– Changes to software or hardware
– Etc.
Automating STPA: John Thomas
• Requirements can be derived automatically (with some user
guidance) using mathematical foundation
• Allows automated completeness/consistency checking
Hazards
Formal (modelbased)
requirements
specification
Hazardous
Control Actions
Discrete
Mathematical
Representation
Predicate calculus
/state machine
structure
36
Is it Practical?
• STPA has been or is being used in a large variety of
industries
–
–
–
–
–
–
–
–
–
–
–
–
Spacecraft
Aircraft
Air Traffic Control
UAVs (RPAs)
Defense
Automobiles (GM, Ford, Nissan)
Medical Devices and Hospital Safety
Chemical plants
Oil and Gas
Nuclear and Electrical Power
C02 Capture, Transport, and Storage
Etc.
Does it Work?
• Most of these systems are very complex (e.g., the
new U.S. missile defense system)
• In all cases where a comparison was made (to FTA,
HAZOP, FMEA, ETA, etc.):
– STPA found the same hazard causes as the old
methods
– Plus it found more causes than traditional methods
– In some evaluations, found accidents that had
occurred that other methods missed (e.g., EPRI)
– Cost was orders of magnitude less than the traditional
hazard analysis methods
Applies to Security Too (AF Col. Bill Young)
• Currently primarily focus on tactics
– Cyber security often framed as battle between adversaries
and defenders (tactics)
– Requires correctly identifying attackers motives,
capabilities, targets
• Can reframe problem in terms of strategy
– Identify and control system vulnerabilities (vs. reacting to
potential threats)
– Top-down strategy vs. bottom-up tactics approach
– Tactics tackled later
– Goal is to ensure that critical functions and services
provided by networks and services are maintained (loss
prevention)
STPA-Sec Allows us to Address Security
“Left of Design”
Attack
Response
Cost of Fix
High
Secure
Systems
Thinking
System
Security
Requirements
Secure
Systems
Engineering
Cyber
Security
“Bolt-on”
Low
Concept
Requirements Design
Build
System Engineering Phases
Abstract Systems
Operate
Physical Systems
Build security into system like safety
Evaluation of STPA-Sec
• Several experiments by U.S. Air Force with real missions
and cyber-security experts
– Results much better than with standard approaches
– U.S. Cyber Command now issuing policy to adopt STPA-sec
throughout the DoD
– Training began this summer
• Dissertation to be completed by Fall, 2014
Current Aviation Safety Research
• Air Traffic Control (NextGen): Cody Fleming, John Thomas
– Interval Management (IMS)
– ITP
• UAS (unmanned aircraft systems) in national airspace:
Major Kip Johnson
– Identifying certification requirements
• Safety in Air Force Flight Test: Major Dan Montes
• Safety and Security in Army/Navy systems: Lieut. Blake
Albrecht
• Security in aircraft networks: Jonas Helfer
Other Safety Research
• Airline operations
• Workplace (occupational) safety (Boeing assembly
plants)
•
Non-aviation:
– Hospital Patient Safety
– Hazard analysis and feature interaction in automobiles
– Leading indicators of increasing risk
– Manned spacecraft development (JAXA)
– Accident causal analysis
– Adding sophisticated human factors to hazard analysis
Human Factors in Hazard Analysis
Role of Process Models in Control
• Controllers use a process model to
determine control actions
Controller
Control
Algorithm
Process
Model
Control
Actions
• Accidents often occur when the
process model is incorrect
– How could this happen?
Feedback
Controlled Process
• Four types of unsafe control actions:
• Control commands required for safety
are not given
• Unsafe ones are given
• Potentially safe commands given too
early, too late
• Control stops too soon or applied too
long
45
(Leveson, 2003); (Leveson, 2011)
Model for Human Controllers
Cody Fleming: Analyzing Safety in ConOps
Conops
Identify:
-- Missing, inconsistent, conflicting information required for safety
-- Vulnerabilities, risks, tradeoffs
-- Potential design or architectural solutions to hazards
Demonstrating on TBO (Trajectory Based Operations)
Route, Trajectory
Management
Function
GROUND (ANSP /
ATC)
CAG
PMG
Alert parameter (G)
Data
Link
Conformance
Monitor [Gnd]
Altitude
Report
{4DT}
(Intent)
Piloting
Function
AIR (Flight Crew)
CAA
FMS;
Manual
PMA
Alert parameter (A)
Conformance
Monitor [Air]
CDTI
Aircraft
ADS-B
GNSS
{h}
{x,y,h,t}
{x,y,h,t}
Comparing Potential Architectures
Control Model for Trajectory Negotiation
Modified Control Model for Trajectory Negotiation
Alternative Control Model for Trajectory Negotiation
It’s still hungry … and I’ve been stuffing worms into it all day.
Nancy Leveson, Engineering a Safer World:
Systems Thinking Applied to Safety
MIT Press, January 2012
Download