Chapter …

advertisement
Part 2
THE MANAGEMENT OF SAFETY
DRAFT
This page intentionally left blank.
Part 2
DRAFT
Chapter 4 - Understanding Safety
Chapter 4
UNDERSTANDING SAFETY
Outline
GENERAL
CONCEPT OF RISK
ACCIDENTS vs. INCIDENTS
ACCIDENT CAUSATION
 Traditional view of causation
 Contemporary view of causation
 Findings and causes
 Incidents: Precursors of accidents
— 1:600 Rule
CONTEXT FOR ACCIDENTS AND INCIDENTS
 Equipment design
 Supporting infrastructure
 Human factors
— SHEL model
 Cultural factors
— National
— Professional
— Organizational
 Corporate safety culture
— Positive safety cultures
— Indications of a positive safety culture
– Informed culture
– Learning culture
– Reporting culture
– Just culture
— Error tolerance
— Blame and punishment
HUMAN ERROR
 Error Types
— Planning errors (mistakes)
— Execution errors (slips and lapses)
— Errors vs. Violations
 Control of human error
— Error reduction
— Error capturing
— Error tolerance
ACCIDENT PREVENTION CYCLE
Part 2
DRAFT
Chapter 4 - Understanding Safety
COST CONSIDERATIONS
 Cost of accidents
 Cost of incidents
 Costs of accident prevention
4-2
Part 2
DRAFT
Chapter 4 - Understanding Safety
Chapter 4
UNDERSTANDING SAFETY
GENERAL
As discussed in Chapter 1, safety is a condition in which the risk of harm or damage is limited to an
acceptable level. The safety hazards creating risk may become evident after a recognized breech of safety
such as an accident or incident, or they may be pro-actively identified through formal safety programmes
– before an actual safety event occurs. Having identified a safety hazard, the risks must be assessed. With
a clear understanding of the nature of the risk, a determination is made as to the “acceptability” of the
risk. Those found to be unacceptable must be acted upon.
Safety management is centred on such a systematic approach to hazard identification and risk
management – in the interests of minimizing the loss of human life, property damage, financial,
environmental and societal losses.
THE CONCEPT OF RISK
Since safety is defined in terms of risk, any consideration of safety must necessarily involve the concept
of risk.
There is no such thing as absolute safety. Before any assessment of whether or not a system is safe can be
made, it is first necessary to determine what is an acceptable level of risk.
Risks are often expressed as probabilities, however the concept of risk involves more than this. To
illustrate this with a hypothetical example, let us assume that the probability of the supporting cable of a
100 passenger cable-car failing and allowing the car to fall was assessed as being the same as the
probability of a 12 passenger elevator failing and allowing the car to fall four floors. While the
probabilities of the events occurring may be the same, the potential consequences of the cable car
accident are much more severe. Risk is therefore two-dimensional. Evaluation of the acceptability of a
given risk associated with a particular hazard must always take account of both the likelihood of
occurrence of the hazard, and the severity of its potential consequences.
The perceptions of risk can be derived into three broad categories:
a) Risks that are so high that they are unacceptable;
b) Risks that are so low that they are acceptable; and
c) Risks in between these two categories, where consideration needs to be given to the various
tradeoffs between risks and benefits.
If the risk does not meet the pre-determined acceptability criteria, an attempt must always be made to
reduce it to a level that is acceptable, using appropriate mitigation procedures. If the risk cannot be
reduced to or below the acceptable level, it may be regarded as tolerable if:
a) the risk is less than the pre-determined unacceptable limit;
4-3
Part 2
DRAFT
Chapter 4 - Understanding Safety
b) the risk has been reduced to a level which is as low as reasonably practicable; and
c) the benefits of the proposed system or changes are sufficient to justify accepting the risk.
Note. All three of the above criteria should be satisfied before a risk is classed as
tolerable.
Even where the risk is classed as acceptable, if any measures which could result in further reduction of
the risk are identified, and these measures require little effort or resources to implement, they should be
implemented.
The acronym ALARP is often used to describe a risk which has been reduced to a level that is as low as
reasonably practicable. In determining what is “reasonably practicable” in this context, consideration
should be given to both the technical feasibility of further reducing the risk, and the cost. All such cases
should be evaluated individually.
Showing a system is ALARP means that any further risk reduction is either impracticable or grossly outweighted by the costs. It should, however, be remembered that when an individual or the society ‘accepts’
a risk, this does not mean that the risk is eliminated. Some level of risk remains, however, the individual
or society has accepted that the residual risk is sufficiently low that it is outweighed by the benefits.
These concepts are illustrated diagrammatically in the Tolerability of Risk (TOR) triangle shown in
Figure 4-1. (In this diagram, the degree of risk is represented by the width of the triangle.)
Risk
Unacceptable
region
Tolerable
region
(ALARP)
Acceptable
region
Negligible
Risk
4-4
Part 2
DRAFT
Chapter 4 - Understanding Safety
Figure 4-1. Tolerability of Risk (TOR) Triangle
Further guidance regarding risk management is contained in Chapter 6.
ACCIDENTS vs. INCIDENTS
ICAO Annex 13 provides definitions of accidents and incidents that may be summarized as follows:
a) An accident is an occurrence during the operation of an aircraft, that entails:
1) A fatality or serious injury;
2) Substantial damage to the aircraft involving structural failure or requiring major repair of the
aircraft; or
3) The aircraft is missing.
b) An incident is an occurrence, other than an accident, associated with the operation of an aircraft
that affects or could affect the safety of operation. A serious incident is an incident involving
circumstances indicating that an accident nearly occurred.
The ICAO definitions use the word “occurrence” to indicate an accident or incident. From the
perspective of safety management, there is a danger in concentrating on the difference between accidents
and incidents through definitions that may be arbitrary and limiting. Many incidents occur every day
which may, or may not, be reported to the investigation authority, but which come very close to being
accidents – often exposing significant risks. Because there is no injury, or little or no damage, they might
not be investigated. This is unfortunate because investigation of an incident may yield better results for
accident prevention, than the investigation of an accident. The difference between an accident and an
incident may just be an element of chance, rather than effective risk management. Indeed, an incident
may be thought of as an undesired event that, under slightly different circumstances, could have resulted
in harm to people or damage to property and thus would have been classified as an accident.
ACCIDENT CAUSATION
The strongest evidence of a serious breech of a system’s safety is an accident. Since safety management
aims to reduce the probability and consequences of accidents, an understanding of accident and incident
causation is essential to understanding safety management. Because accidents and incidents are so closely
related, no attempt is made to differentiate accident causation from incident causation.
Traditional view of causation
Following a major accident, the questioning process begins. For example:
a) How and why did competent personnel make the errors necessary to precipitate the accident? and
b) Could something like this happen again?
4-5
Part 2
DRAFT
Chapter 4 - Understanding Safety
Traditionally, investigators have examined a chain of events or circumstances which ultimately led to
someone doing something inappropriate thereby triggering the accident. This inappropriate behaviour
may have been an error in judgment (such as a deviation from standard operating procedures), an error
due to inattention, or a deliberate violation of the rules.
Following the traditional approach, the investigative focus was more often than not on finding someone to
blame (and punish) for the accident. At best, safety management efforts were concentrated on finding
ways of reducing the risk that such unsafe acts would be committed in the first place. However, the errors
or violations that trigger accidents seem to occur randomly. With no particular pattern to pursue, safety
management efforts to reduce or eliminate random events may be ineffective.
Analysis of accident data all too often reveals that the situation prior to the accident was “ripe for an
accident”. Safety-minded persons may even have been saying that it was just a matter of time before
these circumstances lead to an accident. And when the accident occurs, all too often healthy, qualified,
experienced, motivated and well-equipped personnel committed errors that triggered the accident. They
(and their colleagues) may have committed these errors or unsafe practices many times before without
adverse consequences. Further, some of the unsafe conditions in which they were operating may have
been present for years, again without causing an accident. In other words, there is an element of chance
present.
Sometimes these unsafe conditions were the consequence of decisions by management; they recognized
the risks, but often priorities required a trade off. Indeed, front-line personnel work in a context that is
defined by organizational and management factors often beyond their control. The front-line employees
are but part of a larger system.
To be successful, safety management systems require an alternative understanding of accident causation,
one that depends on examining the total context (i.e. the system) in which these people work.
Contemporary view of causation
According to contemporary thinking, accidents require the coming together of a number of enabling
factors, each one necessary, but in itself not sufficient to breach system defences. Major equipment
failures, or operational personnel errors, are seldom the sole cause of breaches in safety defences. Often
these breakdowns are the consequence of human failures in decision-making. The breakdowns may
involve active failures at the operational level, or they may involve latent conditions conducive to
facilitating a breech of the system’s inherent safety defences. Most accidents include both active and
latent failures.
Figure 4-2 portrays an accident causation model1 that assists in understanding the interplay of
organizational and management factors (i.e. system factors) in accident prevention. Various “defences”
are built into the aviation system to protect against inappropriate performance or poor decisions at all
levels of the system: the front-line workplace, the supervisory levels and senior management. This model
shows that while organizational factors, including management decisions, can create latent failure
conditions that could lead to an accident, they also contribute to the system defences.
Errors and violations having an immediate adverse effect can be viewed as unsafe acts; these are
generally associated with front-line personnel (pilots, controllers, mechanics, etc.). These unsafe acts may
1
Adapted from James Reason, “Collective Mistakes in Aviation: The Last Great Frontier”, Flight Deck, Summer 1992, Issue 4.
4-6
Part 2
DRAFT
Chapter 4 - Understanding Safety
penetrate the various defences put in place to protect the aviation system by company management, the
regulatory authorities, ICAO, etc., resulting in an accident. These unsafe acts may be the result of normal
errors, or they may result from deliberate violations of prescribed procedures and practices. The model
recognizes that there are many error-producing or violation-producing conditions in the work
environment that may affect individual or team behaviour.
Reportable
Incidents
Organization
&
Management
Decisions
Workplace
Error &
Violation
Producing
Conditions
Crew/Team
Defences
Errors
&
Violations
Outcome
Accident
Figure 4 -2. Accident Causation Model
(Adapted from Dr. James Reason)
These unsafe acts are committed in an operational context which includes latent unsafe conditions (latent
failures). A latent failure is the result of an action or decision made well before an accident. Its
consequences may remain dormant for a long time. Individually, these latent failures are usually not
harmful since they are not perceived as being failures in the first place.2
Latent unsafe conditions may only become evident once the system’s defences have been breached. They
may have been present in the system well before an accident and are generally created by decisionmakers, regulators and other people far removed in time and space from the accident. Front-line
operational personnel can inherit defects in the system, such as those created by poor equipment or task
design; conflicting goals (e.g. on-time service vs. safety); defective organizations (e.g. poor internal
communications); or bad management decisions (e.g. deferral of a maintenance item). Effective safety
management efforts aim to identify and mitigate these latent unsafe conditions on a system-wide basis,
rather than by localized efforts to minimize unsafe acts by individuals. Such unsafe acts may only be
symptoms of safety problems, not causes.
Most latent unsafe conditions start with the decision-makers, even in the best-run organizations. These
decision-makers are also subject to normal human biases and limitations, as well as to very real
constraints of time, budget, politics, etc. Since some of these unsafe decisions cannot be prevented, steps
must be taken to detect them and to reduce their adverse consequences.
Fallible decisions by line-management may take the form of inadequate procedures, poor scheduling or
neglect of recognizable hazards. They may lead to inadequate knowledge and skills or inappropriate
operating procedures. How well line-management and the organization perform their functions sets the
2
Wood, Richard H., Aviation Safety Programs: A Management Handbook, 3rd edition, Englewood, Co.: Jeppesen, 2003
4-7
Part 2
DRAFT
Chapter 4 - Understanding Safety
scene for error, or violation, producing conditions. For example, how effective is management with
respect to setting attainable work goals, organizing tasks and resources, managing day-to-day affairs,
communicating internally and externally, etc.? The fallible decisions made by company management and
regulatory authorities are too often the consequence of inadequate resources. However, avoiding the costs
of strengthening the safety of the system can facilitate accidents that are so expensive as to bankrupt the
operator.
Findings and causes
One of the difficulties in determining accident causes involves the level of confidence that can be placed
in the investigative findings. In criminal proceedings, the burden of proof is heavy, seeking virtual
certainty (e.g. “beyond reasonable doubt”). However, in accident investigations, it is often impossible to
determine causes with absolute certainty.
Following an investigation, some stakeholders, especially the general public, media and legal fraternity
are primarily interested in “the cause” of the accident. More enlightened stakeholders will seek an
understanding of all the causes. However, excessive emphasis on determining the causes may be at the
expense of accident prevention. If the investigation went deep enough, beyond the determination of the
causes, it may reveal systemic safety deficiencies that must be corrected in the interests of accident
prevention. A thorough investigation will reveal many findings, perhaps including latent unsafe
conditions that had no role whatsoever in the accident causation. From a safety management point of
view, significant safety deficiencies found in an investigation that may be unrelated to the accident must
be eliminated, or reduced.
A thorough investigation will reveal a number of findings. Some of these findings could be considered
causal. Others may have contributed to the accident in some way (sometimes referred to as contributory
factors). On the basis that Annex 13 defines the objective of an investigation as accident prevention, a
growing number of investigations are determining findings without attempting to identify which of the
findings are causal.
Incidents: Precursors of accidents
Regardless of the accident causation model used, typically there would have been precursors evident
before the accident. All too often, these precursors only become evident with hindsight. Latent unsafe
conditions may have existed at the time of the occurrence. Identifying and validating these unsafe
conditions requires an objective, in-depth risk analysis. Although it is important to fully investigate
accidents with high numbers of fatalities, this may not be the most fruitful means for identifying safety
deficiencies. Care must be taken to ensure that the “blood priority” (often prevalent in the media after
significant loss of life) does not detract from a rational risk analysis of unsafe conditions in aviation.
While using accident investigations to identify hazards is important, it is reactive.
1:600 Rule3. Research into industrial safety in 1969 indicated that for every 600 reported
occurrences with no injury or damage, there were some:
— 30 incidents involving property damage,
— 10 accidents involving serious injuries, and
3
Frank E. Byrd, Jr., 1969
4-8
Part 2
DRAFT
Chapter 4 - Understanding Safety
— 1 major or fatal injury.
Fatal Accident
1
Accidents
10
30
600
Figure 4-3. 1:600 rule
The 1-10-30-600 ratio shown in Figure 4 -3 is indicative of a wasted opportunity, if investigative efforts
are focused only on those rare occurrences where there is serious injury, or significant damage. The latent
factors contributing to such accidents may be present in hundreds of incidents, and could be identified –
before serious injury or damage ensues. Effective safety management requires that all staff and
management identify and analyse hazards before they result in accidents.
In aviation incidents, injury and damage (and therefore, liability) are generally less significant than in
accidents. Accordingly, there is less publicity associated with these occurrences. In principle, more
information regarding such occurrences should be available (e.g. live witnesses, undamaged flight
recorders, etc.). Without the threat of substantial damage suits, there also tends to be less of an adversarial
atmosphere during the investigation. Thus, there should be a better opportunity to identify why the
incidents occurred and, equally, how the defences in place prevented them from becoming accidents. In
an ideal world, the underlying safety deficiencies can all be identified, and preventive measures to
ameliorate these unsafe conditions initiated, before an accident occurs.
4-9
Part 2
DRAFT
Chapter 4 - Understanding Safety
CONTEXT FOR ACCIDENTS AND INCIDENTS
Accidents and incidents occur within a defined set of circumstances and conditions. These include the
aircraft and other equipment, the weather, the airport and flight services, etc. They also include the
regulatory, industry and corporate operating climate. Above all, they include the permutations and
combinations of human behaviour. At any given time, some of these factors may converge in such a way
as to create conditions that are ripe for an accident. Understanding the context in which accidents occur is
fundamental to safety management. Some of the principal factors shaping the context for accidents and
incidents include equipment design, supporting infrastructure, human and cultural factors, corporate
safety culture and cost factors. Each is discussed in this chapter.
Equipment design
Equipment (and job) design is fundamental to both safe operations and maintenance. Simplistically, the
designer is concerned with such questions as:
a) Does the equipment do what it is supposed to do?
b) Does the equipment interface well with the operator? Is it “user-friendly”?
c) Does the equipment fit in the allocated space? etc.
From the equipment operator’s perspective, the equipment must “work as advertised”. The ergonomic
design must minimize the risk (and consequences) of errors. Are the switches accessible? Is the
controlling action intuitive? Are the dials and displays adequate under all operating conditions, etc.? Is the
equipment resistant to mistakes, e.g. “Are you sure you want to delete this file?” Each such factor has
accident potential.
The designer also needs to consider the equipment maintainer’s perspective; there must be sufficient
space available to permit access for required maintenance under typical working conditions and with
normal human strength and reach limitations. The design must also incorporate adequate feedback to
warn of an incorrect assembly.
With advances in automation, design considerations become even more apparent. Whether it is the pilot
in the cockpit, air traffic controllers at their consoles, or a maintenance engineer using automated
diagnostic equipment, the scope for new types of human errors has expanded significantly. Equipment
operators may no longer be certain at all times of what is happening, hence the question: “What is the
computer doing now?” Their situational awareness may be affected accordingly. Although increased
automation has reduced the potential for many types of accidents, safety managers now faces new
challenges induced by that automation, such as lack of situational awareness, boredom, etc.
Supporting infrastructure
From an operator or service provider’s perspective, the availability of adequate supporting infrastructure
is essential to the safe operation of aircraft. This includes the adequacy of the State’s performance with
respect to such things as:
a) Personnel licensing;
4-10
Part 2
DRAFT
Chapter 4 - Understanding Safety
b) Certification of aircraft, operators, service providers and aerodromes;
c) Ensuring the provision of required services;
d) Investigation of accidents and incidents; and
e) Providing operational safety oversight.
From a pilot’s perspective, supporting infrastructure includes such things as:
a) Airworthy aircraft suitable for the type of operation;
b) Adequate and reliable CNS services;
c) Adequate and reliable aerodrome, ground handling, and flight planning services; and
d) Effective support from the parent organization with respect to initial and recurrent training,
scheduling, flight dispatch or flight following system, etc.
An air traffic controller is similarly concerned with such things as:
a) Availability of operable (CNS) equipment suitable for the operational task;
b) Effective procedures for the safe and expeditious handling of aircraft; and
c) Effective support from the parent organization with respect to initial and recurrent training,
rostering and general working conditions.
Human factors4, 5
In a high technology industry like aviation, the focus of problem solving is often on technology.
However, the accident record repeatedly demonstrates that at least three out of four accidents involve
performance errors made by apparently healthy and appropriately qualified individuals. In the rush to
embrace new technologies, the fallible mortals who must interface with and use this equipment are often
overlooked.
The sources of some of the problems causing or contributing to these accidents may be traced to poor
equipment or procedure design, or to inadequate training or operating instructions. But whatever the
origin, understanding normal human performance capabilities, limitations and behaviour in the
operational context is central to understanding safety management. An intuitive approach to Human
Factors is no longer appropriate.
The human element is the most flexible and adaptable part of the aviation system, but it is also the most
vulnerable to influences that can adversely affect its performance. With the majority of accidents resulting
from less than optimum human performance, there has been a tendency to merely attribute them to human
4
Adapted from Ch 2 of Human Factors Guidelines for Safety Audits Manual (Doc 9806)
Readers are referred to Human Factors Training Manual (Doc 9683) for a more comprehensive coverage of the theoretical
and practical aspects of Human Factors.
5
4-11
Part 2
DRAFT
Chapter 4 - Understanding Safety
error. However, the term “human error” is of little help in safety management. Although it may indicate
where in the system the breakdown occurred, it provides no guidance as to why it occurred.
An error attributed to humans may have been design-induced, or stimulated by inadequate equipment or
training, badly designed procedures, or a poor layout of checklists or manuals. Further, the term “human
error” allows concealment of the underlying factors that must be brought to the fore if accidents are to be
prevented. In contemporary safety thinking, human error is the starting point, rather than the stopping
point in accident investigation and prevention. Safety management initiatives seek ways of minimizing or
preventing human errors that might jeopardize safety. This requires an understanding of the operating
context in which humans err, (i.e. an understanding of the factors and conditions affecting human
performance in the workplace).
SHEL Model
The workplace typically involves a complex set of interrelated factors and conditions, which may
affect human performance. The SHEL model6 (sometimes referred to as the SHELL model) can
be used to help visualize the interrelationships among the various components of the aviation
system. The SHEL model is a development of the traditional “man-machine-environment”
system, (the name being derived from the initial letters of its components). SHEL places
emphasis on the human being and the human’s interfaces with the other components of the
aviation system. The following nomenclature is applied:
a) Liveware (L) (humans in the workplace),
b) Hardware (H) (machine and equipment),
c) Software (S) (procedures, training, support, etc.), and
d) Environment (E) (the operating circumstances in which the rest of the L-H-S system
must function).
Figure 4-4 depicts the SHEL model. This building block diagram is intended to provide a basic
understanding of the relationship of the human to other factors in the workplace.
6
The SHEL concept was first developed by Professor Elwyn Edwards in 1972, with a modified diagram to illustrate the model
developed by Frank Hawkins in 1975.
4-12
Part 2
DRAFT
Chapter 4 - Understanding Safety
H
S
L
E
S – software
H – hardware
E – environment
L – liveware
Figure 4 -4. SHEL Model
Liveware. In the centre of the model are those persons at the front line of operations. Although
people are remarkably adaptable, they are subject to considerable variations in performance.
Humans are not standardized to the same degree as hardware; so the edges of this block are not
simple and straight. People do not interface perfectly with the various components of the world in
which they work. To avoid tensions that may compromise human performance, the effects of
irregularities at the interfaces between the various SHEL blocks and the central Liveware block,
must be understood. The other components of the system must be carefully matched to humans if
stresses in the system with accident potential are to be avoided.
Several different factors put the rough edges on the Liveware block; some of the more important
factors affecting individual performance are:
Physical Factors include the individual’s physical capabilities to perform the required tasks,
(e.g. strength, height, reach, vision, and hearing).
Physiological Factors include those factors which affect the human’s internal physical
processes, which can compromise the crew’s physical and cognitive performance, e.g.
oxygen availability, general health and fitness, disease or illness, tobacco, drug or alcohol
use, personal stress, fatigue, or pregnancy.
Psychological Factors include those factors affecting the psychological preparedness of the
individual to meet all the circumstances that might occur during a flight, e.g. adequacy of
training, knowledge and experience, visual illusions and workload. The individual’s
psychological fitness for duty includes motivation and judgement, attitude towards risky
behaviour, confidence, stress, etc.
Psycho-social Factors include all those external factors in the individual’s social system that
bring pressure to bear on them, both in their work and their non-work environments, e.g.
argument with a supervisor, labour-management disputes, a death in the family, personal
financial problems or other domestic tension.
4-13
Part 2
DRAFT
Chapter 4 - Understanding Safety
The SHEL model is particularly useful in visualizing the interfaces between the various
components of the aviation system. These include:
Liveware-Hardware (L-H). The interface between the human and the machine is the one
most commonly considered when speaking of Human Factors. It determines how the human
interfaces with the physical work environment, e.g. design of seats to fit the sitting
characteristics of the human body, displays to match the sensory and information processing
characteristics of the user, controls with proper movement, coding and location. However,
there is a natural human tendency to adapt to L-H mismatches. This tendency may mask
serious deficiencies, which may only become evident after an accident.
Liveware-Software (L-S). The L-S interface is the relationship between the individual and the
supporting systems found in the workplace, e.g. the regulations, manuals, checklists,
publications, standard operating procedures and computer software. It includes such “user
friendliness” issues as currency, accuracy, format and presentation, vocabulary, clarity,
symbology, etc. Increasingly, cockpit automation has altered the nature of crew duties.
Workload may have been increased to such an extent during some phases of flight that crew
members’ attitudes towards each other may be affected (i.e. the L-L interface).
Liveware-Liveware (L-L). The L-L interface is the relationship between the individual and
other persons in the workplace. Flight crews, air traffic controllers, maintenance technicians
and other operational personnel function as groups and group influences play a role in
determining human behaviour and performance. This interface is concerned with leadership,
crew cooperation, teamwork and personality interactions. In aviation, the advent of Crew
Resource Management (CRM) has resulted in considerable focus on this interface. CRM
training promotes teamwork and focuses on the management of normal human errors. The LL interface goes well beyond the crew relationship in the cockpit. Staff/management
relationships are also within the scope of this interface, as are corporate culture, corporate
climate and company operating pressures that can all significantly affect human performance.
Liveware-Environment (L-E). This interface involves the relationship between the individual
and the internal and external environments. The internal workplace environment includes
such physical considerations as temperature, ambient light, noise, vibration, air quality, etc.
The external environment (for pilots) includes such things as visibility, turbulence, terrain,
etc. Increasingly, the work environment for flight crews includes disturbances to normal
biological rhythms, e.g. sleep patterns. Further, the aviation system operates within a context
of broad political and economic constraints, which in turn affect the overall corporate
environment. Included here are such factors as the adequacy of physical facilities and
supporting infrastructure, the local financial situation, regulatory effectiveness, etc. Just as a
crew’s immediate work environment may create pressures to take short cuts, inadequate
infrastructure support may also compromise the quality of crew decision-making.
For the most part, the rough edges of these interfaces can be managed. For example:
a) The designer can ensure the performance reliability of the equipment under specified
operating conditions;
b) During the certification process, the regulatory authority can define the conditions under
which that equipment may be used;
c) The organization’s management can specify standard operating procedures and provide
initial and recurrent training for the safe use of the equipment; and
4-14
Part 2
DRAFT
Chapter 4 - Understanding Safety
d) Individual equipment operators can ensure their familiarity and confidence in using the
equipment safely under all required operating conditions, etc.
Cultural factors7
Culture influences the values, beliefs and behaviours that we share with the other members of our various
social groups. Culture serves to bind us together as members of groups and to provide clues as to how to
behave in both normal and unusual situations. Some see culture as the “collective programming of the
mind”. It is the complex, social dynamic that sets the rules of the game, or the framework for all our interpersonal interactions. Culture is the sum total of the way people conduct their affairs in a particular social
milieu. Culture provides a context in which things happen. For safety management, understanding this
context called culture is an important determinant of human performance and its limitations.
The western world’s approach to management is often based on an emotionally detached rationality,
which is considered to be “scientifically” based. It assumes that human cultures in the workplace
resemble the laws of physics and engineering, which are universal in application. This assumption reflects
a Western cultural bias.
Aviation safety must transcend national boundaries, including all the cultures therein. On a global scale,
the aviation industry has achieved a remarkable level of standardization across aircraft types, countries
and peoples. Nevertheless, it is not difficult to detect differences in how people respond in similar
situations. As people in the industry interact (the Liveware-Liveware (L-L) interface), their transactions
are affected by the differences in their cultural backgrounds. Different cultures have different ways of
dealing with common problems.
Organizations are not immune to cultural considerations. Organizational behaviour is subject to these
influences at every level. The following three levels of culture have relevance to safety management
initiatives:
a) National culture differentiates the national characteristics and values system of particular
nations. People of different nationalities differ, for example, in their response to authority, how
they deal with uncertainty and ambiguity, and how they express their individuality. They are not
all attuned to the collective needs of the group (team or organization) in the same way. In
collectivist cultures, there is acceptance of unequal status and deference to leaders. Such factors
may affect the willingness of individuals to question decisions or actions — an important
consideration in Crew Resource Management (CRM) for example. Crew assignments that mix
national cultures may also affect team performance by creating misunderstandings.
b) Professional culture differentiates the behaviour and characteristics of particular professional
groups (e.g. the typical behaviour of pilots vis à vis that of air traffic controllers, or maintenance
engineers). Through personnel selection, education and training, on-the-job experience, etc.,
professionals (e.g. doctors, lawyers, pilots, controllers) tend to adopt the value system and
develop behaviour patterns consistent with their peers; they learn to “walk and talk” alike. They
generally share a pride in their profession and are motivated to excel in it. On the other side of the
coin, they frequently have a sense of personal invulnerability, e.g. they feel that their performance
is not affected by personal problems, or they do not make errors in situations of high stress.
7
This section is adapted from Human Factors Guidelines for Safety Audits Manual (Doc 9806).
4-15
Part 2
DRAFT
Chapter 4 - Understanding Safety
c) Organizational culture differentiates the behaviour and values of particular organizations (e.g.
the behaviour of members of one company vs. that of another company, or government vs. private
sector behaviour). Organizations provide a shell for national and professional cultures. In an
airline for example, pilots may come from different professional backgrounds (e.g. military vs.
civilian experience, bush or commuter operations vs. development within a large carrier). They
may also come from different organizational cultures due to corporate mergers or lay-offs.
Generally, personnel in the aviation industry enjoy a sense of belonging. They are influenced in
their day-to-day behaviour by the values of their organization. Does the organization recognize
merit? Promote individual initiative? Encourage risk taking? Tolerate breeches of SOPs? Promote
open two-way communications, etc.? Thus, the organization is a major determinant of employee
behaviour.
The greatest scope for creating and nourishing a culture of safety is at the organizational level.
This is commonly referred to as corporate safety culture and is discussed further below.
The three cultural sets described above are important to safe flight operations. They determine how
juniors will relate to their seniors, how information is shared, how personnel will react under stress, how
particular technologies will be embraced and used, how authority will be acted upon, how organizations
react to human errors (e.g. punish offenders, or learn from experience). Culture will be a factor in how
automation is applied in flight operations; how procedures (SOPs) are developed and implemented; how
documentation is prepared, presented, and received; how training is developed and delivered; how crew
assignments are made; relationships between pilots, operations and ATC; relationships with unions, etc.
In other words, culture impacts on virtually every type of interpersonal transaction. In addition, cultural
considerations creep into the design of equipment and tools. Technology may appear to be culture-neutral,
but it reflects the biases of the manufacturer (e.g. consider the English language bias implicit in much of
the world’s computer software). Yet, there is no right and no wrong culture; they are what they are and
they each possess a blend of strengths and weaknesses.
The challenge for safety managers is to understand how culture affects both individuals and aviation
organizations and how that relationship can put safety at risk, or serve to enhance it.
Corporate safety culture8
As seen above, many factors create the context for human behaviour in the workplace. Organizational or
corporate culture sets the boundaries for accepted human behaviour in the workplace by establishing the
behavioural norms and limits. Thus, organizational or corporate culture provides a cornerstone for
managerial and employee decision-making; “This is how we do things here!”
Safety culture is a natural bi-product of corporate culture. The corporate attitude towards safety influences
employees’ collective approach to safety. Safety culture consists of shared beliefs, practices and attitudes.
The tone for safety culture is set and nurtured by the words and actions of senior management. Corporate
safety culture then is the atmosphere created by management which shapes workers’ attitudes towards
safety and accident prevention.
Safety culture is affected by such factors as:
a) Management’s action and priorities;
8
For a more comprehensive discussion of safety culture see Human Factors Guidelines for Safety Audit Manual (Doc 9806)
4-16
Part 2
DRAFT
Chapter 4 - Understanding Safety
b) Policies and procedures;
c) Supervisory practices;
d) Safety planning and goals;
e) Actions in response to unsafe behaviours;
f) Employee training and motivation; and
g) Employee involvement or “buy in”.
The ultimate responsibility for safety rests with the directors and management of the organization –
whether it is an airline, a service provider (e.g. airports and ATS) or an Approved Maintenance
Organization (AMO). The safety ethos of an organization is established from the outset by the extent to
which senior management accepts responsibility for safe operations and for the management of risk.9
Positive safety culture10
Although compliance with safety regulations is fundamental to safety, contemporary thinking is that
much more is required. Operators that simply comply with the minimum standards set by the regulations
are not well situated to identify emerging safety problems.
An effective way to promote a safe operation is to ensure that an operator has a positive safety culture.
Simply put, all staff must be responsible for and consider the impact of safety on everything they do. This
way of thinking must be so deep-rooted that it truly becomes a ‘culture’. All decisions, either by the
Board of Directors, by a driver on the ramp, or by an engineer, need to consider the implications on
safety.
A positive safety culture must be generated from the ‘top down’ and relies on a high degree of trust and
respect between workers and management. Workers must believe that they will be supported in any
decisions made in the interests of safety. They must also understand that intentional breaches of safety
that jeopardize the operation will not be tolerated.
In order to ensure a positive safety culture, management must convince employees that while schedule
delivery and costs are important, safety is paramount. Some organizations go so far as to enunciate a
formal corporate policy on their commitment to a positive safety culture.
Indications of a positive safety culture
A positive safety culture demonstrates such attributes as:
a) Senior management place strong emphasis on safety as part of the strategy of controlling risks
(i.e. minimizing losses);
9
10
Adapted from CASA Aviation Safety Management: An Operator’s Guide to Building a Safety Program
Adapted from Airbus Safety Strategy (OSPH Draft: Sec 1 - Introduction Sep 1999)
4-17
Part 2
DRAFT
Chapter 4 - Understanding Safety
b) Decision-makers and operational personnel hold a realistic view of the short- and long-term
hazards involved in the organization’s activities;
c) Those in top positions:
1) Foster a climate in which there is a positive attitude towards criticisms, comments and
feedback from lower levels of the organization on safety matters;
2) Do not use their influence to force their views on subordinates; and
3) Implement measures to contain the consequences of identified safety deficiencies;
d) Senior management promote a non-punitive working environment; they tolerate legitimate errors
and systematically attempt to derive safety lessons from them;
e) There is an awareness of the importance of communicating relevant safety information at all
levels of the organization (both within and with outside entities);
f) There is promotion of realistic and workable rules relating to hazards, to safety and to potential
sources of damage; and
g) Personnel are well trained and fully understand the consequences of unsafe acts.
h) There is a low incidence of risk-taking behaviour, and a safety ethic which discourages such
behaviour.
Positive safety cultures typically are:
a) Informed cultures. Management fosters a culture where people understand the hazards and risks
inherent in their area of operations. Personnel are provided with the necessary knowledge, skills
and job experience to work safely, and they are encouraged to identify the threats to their safety
and seek the changes necessary to overcome them.
b) Learning cultures. Learning is seen as more than a requirement for initial skills training; rather it
is valued as a lifetime process. People are encouraged to develop and apply their own skills and
knowledge to enhance organizational safety. Staff are updated on safety issues by management
and safety reports are fed back to staff so that everyone can learn the pertinent safety lessons.
c) Reporting cultures. Managers and operational personnel freely share critical safety information
without the threat of punitive action. This is frequently referred to as creating a corporate
reporting culture. Personnel are able to report hazards or safety concerns as they become aware of
them, without fear of sanction or embarrassment.
d) Just cultures. While a non-punitive environment is fundamental for a good reporting culture, the
workforce must know and agree on what is acceptable and what is unacceptable behaviour.
Negligence or deliberate violations must not be tolerated by either management or by the
workers. A culture that recognizes that, in certain circumstances, there may be a need for punitive
action is considered a just culture. Personnel tend to be self-disciplined in a just culture.
4-18
Part 2
DRAFT
Chapter 4 - Understanding Safety
Table 4-1 below summarizes three corporate responses to safety issues ranging from a poor safety culture,
through the bureaucratic approach which only meets minimum acceptable requirements, to the ideal
positive safety culture.
Table 4-1. Characteristics of different safety cultures
Safety Culture:
Poor
Bureaucratic
Positive
Characteristics
Hazard information is:
Suppressed
Ignored
Actively sought
Safety messengers are:
Discouraged or
punished
Avoided
Tolerated
Trained and
encouraged
Shared
Discouraged
Allowed but
discouraged
Local fixes
Responsibility for safety
is:
Dissemination of safety
information is:
Failures lead to:
New ideas are:
Cover ups
Crushed
Fragmented
New problems (not
opportunities)
Rewarded
Inquiries and
systemic reform
Welcomed
Error tolerance
An important dimension of a positive safety culture is the organization’s attitude towards errors and the
perceptions it creates among staff in how it responded to errors. Error tolerance is the term used to
describe the ability of a system to accept an error without serious consequence. The concept of ‘error
tolerance’ was first applied to the ergonomic design of equipment which incorporated physical defences
against inappropriate human acts, for example, air/ground logic to prevent inadvertent gear retraction on
the ground. In addition, procedural actions, such as checklists, crosschecks and readbacks, provide error
tolerance by identifying unsafe conditions before a mishap occurs.
Increasingly, the concept of error tolerance is being extended beyond equipment and job design into
corporate safety culture.
Creating a positive corporate safety culture is so dependent on effective two-way communications
between management and front-line personnel, that organizations are increasingly recognizing the value
of voluntary incident reporting systems that provide immunity to the reporter. However, the effectiveness
of such reporting systems depends largely on the ‘error tolerance’ of the company.
Blame and punishment
Once an investigation has identified the cause of an occurrence, it is usually evident who “caused” the
event. Traditionally, blame (and punishment) could then be assigned. While the legal environments vary
widely between States, many States still focus their investigations on determining blame and apportioning
liability. For them, punishment remains a principal tool.
4-19
Part 2
DRAFT
Chapter 4 - Understanding Safety
Philosophically, punishment is appealing from several points of view, such as:
a) Seeking revenge for a breach of trust;
b) Protecting society from repeat offenders;
c) Altering individual behaviour; or
d) Setting an example for others.
Punishment may have a role to play in dealing with violations where crews intentionally contravene the
“rules”. Arguably, such sanctions may deter the perpetrator of the violation (or others in similar
circumstances) from jeopardizing safe flight operations. In principle, such punishment should be awarded,
regardless of the outcome of the violation, i.e. punishment should not be awarded only to those whose
misdemeanour leads to actual losses for the company.
If the accident was the result of an error in judgement or technique, it is almost impossible to effectively
punish for that error. Change could be made in selection or training processes, or the system made more
tolerant of such errors. If punishment is selected in such cases, two outcomes are almost certain. Firstly,
no further reports will be received of such errors. Secondly, since nothing has been done to change the
situation, the same accident could be expected again.
Perhaps, society needs to use punishment in order to mete out justice. However, the global experience
suggests that punishment has little, if any, systemic value for accident prevention. Except in willful cases
of negligent behaviour, with deliberate violations of the norms, punishment serves little purpose from a
safety perspective.
In much of the international aviation community, a more enlightened view of the role of punishment is
emerging. In part, this parallels a growing understanding of the causes of human errors (as opposed to
violations). Errors are now being viewed as the results of some situation or circumstance, not necessarily
the causes of them. As a result, managers are beginning to seek out the unsafe conditions that facilitate
such errors. They are beginning to find that the systematic identification of organizational weaknesses and
safety deficiencies pays a much higher dividend for safety management than punishing individuals. (That
is not to say that these enlightened organizations are not required to take action against individuals who
fail to improve after counseling and/or extra training.)
While many airlines are taking this more positive approach to the active management of safety, others
have been slow to adopt and implement effective ‘non-punitive policies’. Still, others have been slow to
extend their non-punitive policies beyond flight operations on a corporate wide-basis (to include
maintenance and ground operations).11
HUMAN ERROR
Human error is cited as being a causal or contributing factor in the majority of aviation occurrences. All
too often, competent personnel commit such errors, although clearly they did not plan to have an accident.
Errors are not some type of aberrant behaviour; they are a natural bi-product of virtually all human
11
From IATA Non-Punitive Policy Survey March 2002
4-20
Part 2
DRAFT
Chapter 4 - Understanding Safety
endeavour. Error must be accepted as a normal component of any system where humans and technology
interact. “To err is human”
The factors discussed above create the context in which humans commit errors. Given the rough
interfaces of the aviation system (as depicted in the SHEL model), the scope for human errors in aviation
is enormous. Understanding how normal people commit errors is fundamental to safety management.
Only then can effective measures be implemented to minimize the effects of human errors on safety.
Even if not altogether avoidable, human errors are manageable through the application of improved
technology, relevant training and appropriate regulations and procedures. Most measures aimed at error
management involve front-line personnel. However, the performance of pilots, controllers, technicians,
etc. can be strongly influenced by organizational, regulatory, cultural and environmental factors affecting
the workplace. For example, organizational processes constitute the breeding grounds for many
predictable human errors, such as inadequate communication facilities, ambiguous procedures,
unsatisfactory scheduling, insufficient resources, unrealistic budgeting – really all processes that the
organization can control. Figure 4-5 summarizes some of the factors contributing to human errors – and to
accidents.
Culture
Accidents
Training
Incidents
Personal
Factors
HUMAN ERRORS
Other
Factors
Procedures
Equipment
Design
Organizational
Factors
Figure 4-5. Contributing factors to human error
4-21
Part 2
DRAFT
Chapter 4 - Understanding Safety
Error types
Errors may occur at the planning stage, or during the execution of the plan. Planning errors lead to
mistakes; either the person follows an inappropriate procedure for dealing with a routine problem or
builds a plan for an inappropriate course of action to cope with a new situation. Even when the planned
action is appropriate, errors may occur in the execution of the plan. The human factors literature on such
errors in execution generally draws a distinction between slips and lapses. A slip is an action which is not
carried out as planned, and will therefore always be observable. A lapse is a failure of memory, and may
not necessarily be evident to anyone other than the person who experienced the lapse.
a) Planning errors (Mistakes)
In problem solving we intuitively look for a set of rules (standard operating procedures, rules of
thumb, etc.), which are known and have been used before that will be appropriate to the problem
in hand. Mistakes can occur in two ways; the application of a rule that is not appropriate to the
situation, or the correct application of a rule that is flawed.
Misapplication of good rules. This usually happens when an operator is faced with a situation that
exhibits many features common to the circumstances for which the rule was intended, but with
some significant differences. If the significance of the differences is not recognized, an
inappropriate rule may be applied.
Application of bad rules. This involves the use of a procedure that past experience has shown to
work, but contains unrecognized flaws. If such a solution works in the circumstances under which
it was first tried, it may become part of the individual’s regular approach to solving that type of
problem.
When a person does not have a ready-made solution based on previous experience and/or training,
one draws on personal knowledge and experience. Developing a solution to a problem using this
method will inevitably take longer than applying a rule-based solution, as it requires reasoning based
on knowledge of basic principles. Mistakes can occur because of lack of knowledge, or because of
faulty reasoning. The application of knowledge-based reasoning to a problem will be particularly
difficult in circumstances where the individual is busy, as his or her attention is likely to be diverted
from the reasoning process to deal with other issues. The probability of a mistake occurring becomes
greater in such circumstances.
b) Execution errors (slips and lapses)
The actions of experienced and competent personnel tent to be routine and highly practiced; they
are carried out in a largely automatic fashion, except for occasional checks on progress. Slips and
lapses can occur as the result of:
Attentional slips occur as the result of a failure to monitor the progress of a routine action at some
critical point. These are particularly likely when the planned course of action is similar to a
routinely used procedure, but not identical. If attention is allowed to wander or a distraction
occurs at the critical point where the action differs from the usual procedure, the result can be that
the operator will follow the usual procedure rather than the one intended in this instance.
Memory lapses occur when we either forget what we had planned to do, or omit an item in a
planned sequence of actions.
4-22
Part 2
DRAFT
Chapter 4 - Understanding Safety
Perceptual errors are errors in recognition. They occur when we believe we saw or heard
something which is different from the information actually presented.
c) Errors vs. Violations
Errors (which are a normal human activity) are quite distinct from violations. Both can lead to a
failure of the system. Both can result in a hazardous situation. The difference lies in the intent.
A violation is a deliberate act, while an error is unintentional. Take, for example, a situation in
which a controller allows an aircraft to descend through the level of a cruising aircraft when the
DME distance between them is 18 NM, and this occurs in circumstances where the correct
separation minimum is 20 NM. If the controller made a mistake in calculating the difference in
the DME distances advised by the pilots, this would be an error. If the controller calculated the
distance correctly, and allowed the descending aircraft to continue through the level of the
cruising aircraft knowing that the required separation minimum did not exist, this would be a
violation.
Violations should not be tolerated. There have been a number of accidents where the existence of
a corporate culture which tolerated, or in some cases even encouraged, the taking of short cuts
rather than following published procedures in full, was identified as a major contributory cause in
an accident.
Control of human error
Fortunately, few errors lead to adverse consequences, let alone accidents. Typically, errors are identified
and corrected with no undesirable outcomes, for example, selecting an incorrect frequency or setting the
altitude bug to the wrong altitude. On the understanding that errors are normal in human behaviour, the
total elimination of human error would be an unrealistic goal. The challenge then is not merely to prevent
errors, but to learn to safely manage the inevitable errors.
Three strategies for managing human errors are briefly discussed below. Such strategies are relevant in
flight operations, air traffic control or aircraft maintenance.12
a) Error reduction strategies intervene directly at the source of the error by reducing or eliminating
the contributing factors to the error. They aim at eliminating any adverse conditions that increase
the risk of error. Examples of error reduction strategies include improving the access to an aircraft
component for maintenance, improving the lighting in which the task is to be performed, reducing
environmental distractions and providing better training.
b) Error capturing assumes the error has already been made. The intent is to “capture” the error
before any adverse consequences of the error are felt. Error capturing is different from error
reduction in that it does not directly serve to reduce or eliminate the error. Examples of errorcapturing strategies include crosschecking to verify correct task completion, functional test
flights, etc.
c) Error tolerance refers to the ability of a system to accept an error without serious consequence.
Examples of measures to increase error tolerance are the incorporation of multiple hydraulic or
12
From Human Factors Training Manual (Doc 9683)
4-23
Part 2
DRAFT
Chapter 4 - Understanding Safety
electrical systems on an aircraft to provide redundancy or a structural inspection programme that
provides multiple opportunities to detect a fatigue crack - before it reaches critical length.
Some airlines have implemented error management strategies that have significantly reduced human
errors in such areas as rushed or non-stabilized approaches and incorrect use of checklists. Reducing the
frequency and consequences of human error provides enormous scope for safety management.
ACCIDENT PREVENTION CYCLE
Accident prevention begins with an appreciation
of the operational context in which accidents occur.
Given the number and potential relationships of all the factors that may affect safety, an effective
management system is required for accident prevention. An example of the type of systematic process
required is shown in Figure 4 -6 Accident Prevention Cycle. A brief description of the cycle follows.
Identify
Hazard
Monitor
Progress
Assess
Risks
CAccident
Prevention
Cycle
Take
Action
AControl
Options
ORisk
Communication
Figure 4-6. Accident Prevention Cycle
4-24
Part 2
DRAFT
Chapter 4 - Understanding Safety
Hazard identification is the critical first step in managing safety. Hard evidence of hazards is required and
may be obtained in a number of ways, from a variety of sources, for example:
a) Hazard and incident reporting systems;
b) Investigation and follow-up of reported hazards and incidents;
c) Trend analysis;
d) Feedback from training;
e) Flight data analysis;
f) Safety surveys and operational oversight safety audits;
g) Monitoring normal line operations;
h) State investigation of accidents and serious incidents; and
i)
Information exchange systems.
Each hazard identified must be evaluated and prioritized. This evaluation requires the compilation and
analysis of all available data. The data is then assessed to determine the extent of the hazard; is it a “oneof-a-kind”, or is it systemic? A database may be required to facilitate the storage and retrieval of the data.
Appropriate tools are then needed to analyse the data.
Having validated a safety deficiency, decisions must then be made as to the most appropriate action to be
taken to reduce or eliminate the hazard. The solution must take into account the local conditions, as “one
size” does not fit all situations. Care must be taken that the solution does not introduce new hazards. This
is the process of risk management.
Once appropriate safety action has been implemented, performance must be monitored to ensure that the
desired outcome has been achieved, for example:
a) The hazard has been eliminated (or at least been reduced in severity);
b) The action taken permits coping satisfactorily with the hazard; and
c) No new hazards have been introduced into the system.
If the outcomes are unsatisfactory, the whole process must be repeated.
Accident prevention is a dynamic process that requires change. People are inherently resistant to change.
Management may be especially resistant to change if it costs money. Supposition and conjecture are
unlikely to convince decision-makers to spend money on measures with ill-defined prospects for
significant benefit. Thus, providing a compelling argument for change is a fundamental challenge for
those managing safety.
4-25
Part 2
DRAFT
Chapter 4 - Understanding Safety
COST CONSIDERATIONS
Operating a profitable, yet safe airline or service provider requires a constant balancing act between the
need to fulfill production goals (such as on-time departures) vs. safety goals (which may require taking
extra time to ensure that a door is properly secured). The aviation workplace is filled with potentially
unsafe conditions which will not all be eliminated; yet, operations must continue.
Some airlines adopt a goal of “zero accidents” and state that “safety is their number one priority”. The
reality is that airlines (and other commercial aviation organizations) are in the business of making money.
Profit or loss is the immediate indicator of the company’s success in meeting its production goals.
However, safety is a prerequisite for a sustainable aviation business, as a company tempted to cut corners
will eventually realize. For most companies, safety can best be measured by the absence of accidental
losses. Companies may realize they have a safety problem following a major accident or loss, in part
because it will impact on the profit/loss statement. However, a company may operate for years with many
potentially unsafe conditions without adverse consequence. In the absence of effective safety
management to identify and correct these unsafe conditions, the company may assume that it is meeting
its safety objectives as evidenced by the “absence of losses”. In reality, it has been lucky.
C
Costs
Total Costs
Risk
Reduction
Losses
Protection
Figure 4-7. Safety vs. costs
Safety and profit are not mutually exclusive. Indeed, quality organizations realize that expenditures on the
correction of unsafe conditions are an investment toward long-term profitability. Losses cost money. As
money is spent on risk reduction measures, costly losses are reduced – as shown in Figure 4-7. However,
by spending more and more money on risk reduction, the gains made through reduced losses may not be
in proportion to the expenditure. Companies must balance the costs of losses and expenditures on risk
reduction measures. In other words, some level of loss is acceptable from a straight profit and loss point
of view. However, few organizations can survive the economic consequences of a major accident. Hence,
4-26
Part 2
DRAFT
Chapter 4 - Understanding Safety
there is a strong economic case for an effective safety management system. They may require energy and
persistence, but not always a large budget.
Cost of accidents
There are three types of costs associated with an accident or serious incident: Direct, Indirect and
Industry/social costs.
Direct costs:
These are the obvious costs, which are fairly easily determined. They mostly relate to physical
damage, and include rectifying, replacing or compensating for injuries, aircraft equipment and
property damage. The high costs of an accident can be reduced by insurance coverage. (Some large
organizations effectively self-insured by putting funds aside to cover their risks).
Indirect costs:
While insurance may cover specified accident costs, there are many uninsured costs. An
understanding of these uninsured costs (or indirect costs) is fundamental to understanding the
economics of safety and hence, the viability of measures for accident prevention.
Indirect costs include all those things that are not directly covered by insurance and usually total
much more than the direct costs resulting from an accident. Such costs are sometimes not obvious and
are often delayed. Some examples of uninsured costs that may accrue from an accident include.13
a) Loss of business and damage to the reputation of the organization. Many organizations will
not allow their personnel to fly with an operator with a questionable safety record.
b) Loss of Use of Equipment equates to lost revenue. Replacement equipment may have to be
purchased or leased. Companies operating a one-of-a-kind aircraft may find that their spares
inventory and the people specially trained for such an aircraft become surplus.
c) Loss of staff productivity. If people are injured in an accident, and are unable to work, many
States require that they continue to be paid. Also, they will need to be replaced at least for the
short term, incurring the costs of wages, overtime (and possibly training), as well as imposing
an increased workload on the experienced workers.
d) Investigation and clean-up are usually uninsured costs. Operators may incur costs from the
investigation including the costs of their staff involvement in the investigation, the costs of
tests and analysis, wreckage recovery, restoring the accident site, etc.
e) Insurance deductibles, the policyholder’s obligation to cover the first portion of the cost of
any accident must be paid. A claim will also put a company into a higher risk category for
insurance purposes, and therefore may result in increased premiums. (Conversely, the
implementation of a comprehensive safety management system could help a company to
negotiate a lower premium.)
13
Wood, Richard H., Aviation Safety Programs: A Management Handbook, 3rd edition, Englewood, Co.: Jeppesen, 2003
4-27
Part 2
DRAFT
Chapter 4 - Understanding Safety
f) Legal action and damage claims. Legal costs can accrue rapidly. While it is possible to take
out insurance for public liability and damages, it is virtually impossible to cover the cost of
time lost handling legal action and damage claims.
g) Fines and Citations by governmental authorities may be imposed, including possibly
shutting-down unsafe operations.
Industry and social costs:
In addition to the dollar costs identified above, an aviation disaster may compromise the aviation
industry’s overall reputation and market much more widely than just the accident airline. (Events
after 11 September 2001 are instructive.) Travelers switching to alternate means of transportation
(such as road travel) may be exposed to additional risks – a social cost.
Costs of incidents
Serious aviation incidents, which result in minor damage or injuries, can also incur many of these indirect
or uninsured costs. Typical cost factors arising from such incidents can include:
a) Fight delays and cancellations;
b) Alternate passenger transportation, accommodation, complaints, etc.;
c) Crew change and positioning;
d) Loss of revenue and reputation, etc.
e) Aircraft recovery, repair and test flight; and
f) Incident investigation.
Costs of accident prevention
The costs of accident prevention are probably even more difficult to quantify than the full costs of
accidents — in part, because of the difficulty in assessing the value of accidents that have been prevented.
Nevertheless, some airlines have attempted to quantify the costs and benefits of introducing safety
management systems (SMS). They have found the cost savings to be substantial. Performing a cost
benefit analysis is complicated due to the small number of accidents. However, it is an exercise that
should be undertaken, as senior management are not inclined to spend money if there is no quantifiable
benefit. One way of addressing this issue is to separate the costs of the safety management system from
the cost of correcting safety deficiencies, by charging the safety management costs to the safety
department and the safety deficiency costs to the line management most responsible. This exercise
involves senior management in considering costs and benefits.
If you think safety is expensive,
try an accident.
————————
4-28
Chapter 5
BASICS OF SAFETY MANAGEMENT
Outline
THE PHILOSOPHY OF SAFETY MANAGEMENT
 General
 Core business function
 Systems approach
 System safety
FACTORS AFFECTING SYSTEM SAFETY
 System failures
 Active and latent failures
 Equipment faults
 Human Errors
 System design (Defences-in-depth)
SAFETY MANAGEMENT CONCEPTS
 Hazard identification
 Cornerstones of safety management
 Strategies for safety management
 Key safety management activities
 Safety management process
 Safety oversight
 Safety Culture
 Safety performance indicators and targets
 Safety Management Systems (SMS)
 Threat Error Management (TEM)
Appendices:
1. Three Cornerstones of Safety Management
2. Précis – Threat and Error Management
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
This page intentionally left blank.
5-2
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Chapter 5
BASICS OF SAFETY MANAGEMENT
THE PHILOSOPHY OF SAFETY MANAGEMENT
General
In view of the total costs of a major accident, management needs to conserve the organization’s assets and
minimize its losses. This requires management of all the factors which can impact on safety – so as to
reduce the risks of accidental loss to an acceptable level. Safety management then is the systematic
management of the risks associated with flight operations, related ground operations and aircraft
engineering or maintenance activities to achieve high levels of safety performance.
For the purposes of this manual, safety was defined in Chapter 1 as “the state in which the risk of harm to
persons or property damage is reduced to, and maintained at or below, an acceptable level through a
continuing process of hazard identification and risk management.”
Core business function
In successful aviation organizations, safety management is a core business function — just as financial
management is. The application of contemporary risk management methods facilitates the identification
of their weaknesses and guides management towards the cost-effective resolution of unacceptable risks.
Effective information management is required to support safety analyses and for the sharing of safety
lessons and best practices across the industry. Finally, some system of performance measurement is
required to confirm the effectiveness of the organization’s safety management system and to test the
validity of steps taken to reduce risks.
Effective safety management requires a realistic balance between safety and production goals. Thus, a
coordinated approach, in which all the organization's goals and resources are analysed, helps ensure that
decisions concerning safety are realistic and complementary to the operational needs of the organization.
The finite limits of financing and operational performance must be accepted in any industry. Defining
acceptable and unacceptable risks is therefore important for cost-effective safety management. If properly
implemented, safety management measures not only increase safety, but also improve the operational
effectiveness of an organization.
Experience in other industries and lessons learned from the investigation of aircraft accidents have
emphasized the importance of managing safety in a systematic, proactive, and explicit manner.
Systematic means that safety management activities will be in accordance with a pre-determined
plan, and will be applied in a consistent manner throughout the organization.
Proactive means the adoption of an approach which emphasizes prevention, through the
identification of hazards and the introduction of risk mitigation measures before the risk-bearing
event occurs and adversely affects safety performance.
5-3
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Explicit means that all safety management activities should be documented, visible and
performed independently from other management activities.
Addressing safety in a systematic, proactive and explicit manner ensures, on a long-term basis, that safety
becomes an integral part of the day-to-day business of the organization, and that the safety-related
activities of the organization are directed to the areas where the benefits will be greatest.
Systems approach
To ensure that all possible sources of hazards which could affect safety are identified, safety management
requires a systems approach. The “system” includes all the people, procedures and technology needed to
operate or support the aviation system.
Modern approaches to safety management have been shaped by the concepts introduced in Chapter 3
(Understanding Safety), and in particular, the role of organizational issues as contributory factors in
accidents and incidents. Safety cannot be achieved simply by introducing rules or directives concerning
the procedures to be followed by operational staff.
The scope of safety management encompasses almost the whole range of activities of the organization
concerned. It is for this reason that safety management must start from the senior level of management,
and that potential effects on safety must be examined at all levels of the organization.
System safety
System safety may be described as the systematic application of specific technical and managerial skills
to the identification and control of hazards under pre-determined conditions with an acceptable minimum
of accidental loss. The primary objective of system safety is accident prevention and it is achieved by
focusing on the identification and control of hazards associated with a system or product. By proactively
identifying, assessing, and eliminating or controlling safety-related hazards, acceptable levels of safety
are achievable.
System safety was developed as an engineering discipline for aerospace and missile defence systems in
the 1950s. Its practitioners were safety engineers, not operational specialists. As a result, their focus
tended to be on designing and building fail-safe systems. On the other hand, civil aviation tended to focus
on flight operations, and safety managers often came from the ranks of pilots. In pursuing improved
safety, it became necessary to view aviation safety as more than just the aeroplane and its pilots. Aviation
is a total system that includes everything needed to keep an aeroplane in the air safely. The ‘system’
includes the airport, air traffic control, maintenance, flight attendants, ground operational support,
dispatch and others. Thus, the total aviation system is more than just the aeroplane and its pilots, and
sound safety management is required for all parts of it.
FACTORS AFFECTING SYSTEM SAFETY
The factors affecting safety within the defined system can be seen from two points of view; first, a
discussion of those factors which may result in situations in which safety is compromised, and second, an
examination of how an understanding of these factors can be applied to the design of systems to reduce
the likelihood of occurrences which may compromise safety.
5-4
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
The search for factors that could compromise safety must include all levels of the organization
responsible for both operations and the provision supporting services. As outlined in Chapter 3
(Understanding Safety), safety starts at the highest level of the organization. Any situation that results in
safety standards being compromised can be considered a failure of the system to meet the primary goals
of the organization.
System failures
A system failure is any occurrence that results in the inability to provide the level of service, which the
system is intended to provide. Failure of one element of the system does not necessarily result in a system
failure. If an alternate that provides the required level of service is available, and there is no interruption
of service in the changeover, the system as a whole has not failed.
As seen in Chapter 4, a failure rarely has a single cause. There will always be at least one (and often more
than one) initiating event. Initiating events occur close in time to the failure, and are obviously linked to
it. There are usually also a number of contributory events. Sometimes the link between contributory
events and the failure may be obvious (at least after the event), however there is another group of
contributory events where the link is not always obvious. These may include decisions regarding system
design or organizational policy, taken months, or even years, before the failure.
The causal factors for a system failure may be related to equipment, or the result of human error, or a
combination of both. In this manual, the term failure will be reserved for system failures as defined
above. The term fault will be used to describe malfunctions in equipment or facilities. The term error will
be used only in the context of human error.
A system failure is any occurrence which results in the inability to
provide the level of service that the system is intended to provide.
A fault is any occurrence that results in an equipment malfunction.
An error is a failure in human performance.
Active and latent failures
When a failure actually occurs in a system, its results can be observed. Such a failure is classified as an
active failure. The term latent failure is used to describe the existence in a system of hazards or unsafe
conditions which, given a particular sequence of triggering events, could lead to a failure of the system.
The direct causes of active failures are generally the result of equipment faults or errors committed by
operational personnel. Latent failures, however, always have a human element. They may be the result of
undetected design flaws. They may be related to unrecognized consequences of officially approved
procedures. There have also been a number of cases where latent failures have been the direct result of
decisions taken by the management of the organization. For example, latent failures exist when the
culture of the organization encourages taking short cuts rather than always following approved
procedures. The direct cause of a failure associated with taking short cuts would be the failure of a person
at the operational level to follow correct procedures. However, if there is general acceptance of this sort
5-5
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
of behaviour amongst operational personnel, and management are either unaware of this or take no action,
there is a latent failure in the system at the management level.
Equipment faults
The determination of the likelihood of system failures due to equipment faults is the domain of reliability
engineering. The probability of system failure is determined by analyzing the failure rates of individual
components of the equipment. The causes of the component failures may include electrical, mechanical
and software faults.
A safety analysis is required to consider both the likelihood of failures during normal operations and the
effects of continued unavailability of any one element on other aspects of the system. The analysis should
include the implications of any loss of functionality or redundancy as the result of equipment being taken
out of service for maintenance. It is therefore important that the scope of the analysis and the definition of
the boundaries of the system for purposes of the analysis be sufficiently broad that all necessary
supporting services and activities are included. As a minimum, a safety analysis should consider the
elements of the SHEL model outlined in Chapter 4; specifically it should include the following sources of
faults:





Hardware faults,
Software malfunctions,
Environmental conditions,
Dependencies on external services, or
Operational and maintenance procedures.
It is possible that a single fault may cause the loss of more than one service or system function. An
example would be a situation where both primary and secondary VHF communications equipment shared
common primary and stand-by power supplies. In the event of a failure of both primary and stand-by
power, both primary and secondary VHF would not be available. Such multiple failures resulting from a
single fault are called common mode failures. The linkages, which can cause a common mode failure, are
not always obvious. The possibility of common mode failures needs careful investigation in the early
stages of system design.
The techniques for estimating the probability of overall system failure as a result of equipment faults and
estimating parameters such as availability and continuity of service are well established and are described
in standard texts on reliability and safety engineering. These issues will not be addressed further in this
manual.
Human errors
An error occurs when the outcome of a task being performed by a human is not the intended outcome.
The way in which a human operator approaches a task depends on the nature of the task, and how familiar
the operator is with it. Human performance may be skill-based, rule-based or knowledge-based. Errors
may be the consequence of lapses in memory, slips in doing what was intended, or the result of mistakes
which are conscious errors in judgement. A distinction should also be made between honest or normal
errors committed in the fulfilment of assigned duties and deliberate violations of prescribed procedures or
accepted safe practices. In order to understand human error and how it occurs, it is therefore necessary to
understand the mechanisms involved in human performance. These are discussed in Chapter 4
(Understanding Safety).
5-6
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
System design
Chapter 4 also described the context for accidents. Given the complex interplay of so many human,
materiel, and environmental factors in normal operations, the complete elimination of risk is an
unachievable goal. Even in organizations with the best of training programmes and a positive safety
culture, human operators will occasionally make errors. The best-designed and maintained equipment will
occasionally fail. System designers must therefore take account of the inevitability of errors and failures.
It important that the system be designed and implemented in such a way that, to the maximum extent
possible, errors and equipment failures will not result in an accident; the systems are ‘error tolerant’.
The hardware and software components of a system are generally designed to meet specified levels of
availability, continuity and integrity. The techniques for estimating system performance in terms of these
parameters are well established. When necessary, redundancy can be built into the system, to provide
alternatives in the event of failure of one or more elements of the system.
The performance of the human element cannot be specified as precisely as this; however, it is essential
that the possibility of human error be considered as part of the overall design of the system. This requires
an analysis to identify potential weaknesses in the procedural aspects of the system, taking into account
the normal shortcomings in human performance. The analysis should also take into account the fact that
accidents rarely have a single cause. As noted earlier, they usually occur as part of a sequence of events in
a complex situational context; so the analysis needs to consider combinations of events and
circumstances, to identify sequences that could possibly result in safety being compromised.
Developing a safe and error tolerant system requires that the system contain multiple defences, barriers
and safeguards to ensure that, as far as possible, no single failure or error will result in an accident, and
that when a failure or error occurs, it will be recognized and remedial action taken before a sequence of
events leading to an accident can develop. The need for a series of defences rather than just a single
defensive layer arises from the possibility that the defences themselves may not always work perfectly.
This design philosophy is called defences-in-depth.
Failures in the defensive layers of an operational system can be thought of as creating gaps in the
defences. As the operational situation or equipment serviceability states change, gaps may occur as a
result of:




Undiscovered and longstanding shortcomings in the defences,
The temporary unavailability of some elements of the system as the result of maintenance action,
Active failures of equipment, or
Human errors or violations.
For an accident to occur in a well designed system, these gaps must develop in all the defensive layers of
the system at the critical time when that defence should have been capable of detecting the earlier error or
failure. An illustration of how an accident event must penetrate all defensive layers is illustrated
diagrammatically in Figure 5-1.
5-7
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Figure 5-1. Defences-in-depth
Table 5-1 below shows some examples of defences, barriers and safeguards at the technological,
operational and organizational levels.
5-8
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Table 5–1. Examples of Defences, Barriers and Safeguards
1.
2.
Equipment

Redundancy:
o Full redundancy providing same level of functionality when operating on the alternate system.
o Partial redundancy resulting in some reduction in functionality, e.g. local copy of essential
data from a centralized network database.

Independent checking of design calculations and assumptions.

System designed to ensure a gradual degradation of capability (not total loss of capability) in the event
of failure of individual elements.

Policy and procedures regarding maintenance which minimize loss of functionality in the active
system, or loss of redundancy.

Automated work aids, e.g. STCA, MSAW, Runway Entry Monitors, ACAS.
These should always be subsidiary to safe and effective operating procedures, which form the
primary defence against the events which these automated aids are designed to detect.
Operating Procedures
Standard Operating Procedures (SOPs).
Read back of critical items in clearances and instructions.
Use of checklists.
3.
Organizational Factors
Clear safety policy:
Must be implemented, adequate funding for safety management activities.
Oversight to ensure correct procedures are followed:
No tolerance for violations or short-cuts.
Adequate control over the activities of contractors.
The discussion of accident causation and the operational context in which accidents occur demonstrate the
need for these defences in-depth to include equipment design and operating procedures which take
account of the natural limits of humans and material, including human and cultural factors as well as
environmental factors.
These principles may be applied to the task of reducing risk in both proactive and reactive ways. By
careful analysis of a system, it is possible to identify sequences of events where a combination of the
equipment faults and human errors could lead to an accident or an incident. Identification of the active
and latent failures revealed by this type analysis will enable remedial action to be taken to strengthen the
system’s defences.
When planning significant change to a system, a formal assessment of the potential safety implications is
warranted. Chapter 13 outlines the process for conducting safety assessments.
5-9
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
SAFETY MANAGEMENT CONCEPTS
Hazard identification
Hazard identification is the process of identifying those situations or conditions that have the potential to
cause harm to persons or damage to property. The scope for hazards is wide, including aspects of design
and manufacture, operations and maintenance. Hazards may involve system design faults, active or latent
failures, isolated or systemic equipment faults or human errors, etc.
Isolated hazards and risks may not be a significant problem. However, when hazards or risks exist
concurrently at many levels, there is an increased probability of an accident or incident. Therefore, a
structured approach to the identification of hazards is required to ensure that, as far as possible, all
potential hazards are identified. Indeed, hazard identification is the sine qua non of safety management. It
requires objective, accurate information which is free from emotion.
Hazards may be identified through several diverse safety management activities; some are reactive to
known events, others are proactive. For example:
a) Incident reporting systems;
b) Occurrence investigations;
c) Trend monitoring and safety analyses;
d) Proactive safety processes such as Flight Data Analysis (FDA) and Line Operations Safety Audits
(LOSA); and
e) Safety audits and formal safety assessments.
The defences in depth described above protect against hazards. Part 4 — Applied Safety Management
provides additional guidance on the creation and operation of several processes for effective hazard
identification.
The key features of an effective hazard identification system are:
a) Proactive and reactive identification unsafe conditions;
b) Collection of current and applicable hazard information;
c) A procedure for receiving and actioning reports of hazards;
d) Measures to protect the identity of persons identifying the hazards;
e) A reliable method of accurately recording, storing, and retrieving hazard data;
f) The capability to analyze hazard reports, both individually as well as in aggregate;
g) A procedure for distributing lessons learned to affected staff and contractors; and
h) Capable of being audited.
5-10
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Cornerstones of safety management
In its most simple terms, safety management involves hazard identification and the closing of any
potential gaps in the defences of the system. (See Fig. 5-1 above.) Effective safety management is multidisciplinary, requiring the systematic application of a variety of techniques and activities across the
spectrum of aviation activities, in a continuous process. It builds upon three defining cornerstones:
A comprehensive corporate approach sets the tone for the management of safety. The corporate
approach builds upon the safety culture of the organization and embraces the organization’s safety
policies, objectives and goals, and perhaps most importantly, senior management’s commitment to
safety;
Effective organizational tools are needed to deliver the necessary activities and processes to advance
safety. It includes how the organization arranges its affairs to fulfil its safety policies, objectives and
goals, how it establishes standards and allocates resources, etc. The principal focus is on hazards and
their potential effects on safety-critical activities; and
A system for safety oversight to confirm the organization’s continuing fulfilment of its corporate
safety policy, objectives, goals and standards. Feedback mechanisms facilitate continuous
improvement. These include such activities as monitoring and analysis of operational data, safety
audits and performance analysis, the identification and adoption of best industry practices, etc.
A more detailed examination of each of these cornerstones is provided at Appendix 1 to this Chapter
(Three Cornerstones of Safety Management).
Strategies for safety management
The strategy that an organization adopts for its safety management system will reflect its corporate safety
culture and may range from purely reactive, responding only to accidents, through to strategies that are
highly proactive in their search for safety problems. The traditional or reactive process is dominated by
retrospective repairs (i.e. fixing the stable door after the horse has bolted). Under the more contemporary
or proactive approach prospective reform plays the leading part (i.e. make a stable from which no horse
could run, or even want to). Depending on the strategy adopted, different methods and tools need to be
employed.
a) Reactive strategy: Investigate accidents and reportable incidents. This strategy is useful for
situations involving failures in technology or unusual events. The utility of the reactive approach
for accident prevention purposes depends on the extent to which the investigation goes beyond
determining the causes, to include an examination of all the contributory factors. The reactive
approach tends to be marked by:
1) Management’s safety focus being on compliance with minimum requirements;
2) Safety measurement being based on reportable accidents and incidents with such
limitations in value as:
— Any analysis is limited to examining actual failures;
5-11
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
— Insufficient data is available to accurately determine trends, especially those
attributable to human errors; and
— Little insight is available into the ‘root causes’ and latent unsafe conditions,
which facilitate human errors; and
3) Constant catching up is required to match human inventiveness for new types of errors
b) Proactive safety strategy: Prevent accidents by aggressively seeking information from a variety of
sources which may be indicative of emerging safety problems. Organizations pursuing a proactive
strategy for accident prevention believe that the risk of accidents can be minimized by identifying
vulnerabilities before they fail, and by taking the necessary actions to reduce those risks. To do
this, they actively seek systemic unsafe conditions through such tools as:
1) Hazard and incident reporting systems which promote the identification of latent unsafe
conditions;
2) Safety surveys to elicit feedback from front-line personnel about areas of dissatisfaction
and unsatisfactory conditions which may have accident potential;
3) Flight data recorder analysis for identifying operational exceedances and confirming
normal operating procedures;
4) Operational inspections or audits of all aspects of operations, to identify vulnerable areas
before accidents, incidents, or minor safety events confirm a problem exists; and
5) A policy for consideration and embodiment of manufacturers’ service bulletins; etc.
Key safety management activities
Those organizations which manage safety most successfully practice several common activities. They
tend to do specific things; for example:

Organization. They are organized to establish a culture of safety and reduce their accidental
losses. Larger organizations may implement a formal safety management system as outlined in
Part 3 of this manual.

Safety assessments. They systematically analyze proposed changes to equipment or procedures to
identify and mitigate weaknesses before change is implemented.

Occurrence reporting. They have established formal procedures for reporting safety occurrences
and other unsafe conditions.

Hazard identification schemes. They employ both reactive and proactive schemes for identifying
safety hazards throughout their organization, such as voluntary incident reporting, safety surveys,
operational safety audits, safety assessments, etc. Part 4 outlines several safety processes that are
effective for the identification of safety hazards; for example, Flight Data Analysis (FDA) and
Line Operations Safety Audits (LOSA).

Investigation and analysis. They follow-up on reported occurrences and unsafe conditions, if
necessary, initiating competent safety investigations and safety analyses.
5-12
Part 2
DRAFT
Chapter 5 - Basics of Safety Management

Performance monitoring. They actively seek feedback necessary to close the loop of the accident
prevention cycle (outlined in Chapter 4 – Understanding Safety) – using such techniques as trend
monitoring and internal safety audits.

Safety promotion. They actively disseminate the results of safety investigations and analyses,
sharing safety lessons learned both internally within the organization and outside if warranted.

Safety oversight. They systematically monitor and assess safety performance.
All these activities are described elsewhere in this manual.
Safety management process
Conceptually, the safety management process parallels the cycle of accident prevention described in
Figure 4-6. Both involve a continuous loop process as represented in Figure 5-2 below.
Collect
Data
Latent Unsafe Conditions
Re-evaluate
Situation
Implement
Strategies
Latent Unsafe Conditions
Collect
Additional Data
Safety
Management
Process
Analyze
Data
Prioritize
Unsafe Conditions
Assign
Responsibilities
Develop
Strategies
Approve
Strategies
Figure 5-2. Safety management process
Safety management is evidence-based, in that it requires the analysis of actual data to identify hazards.
Using risk assessment techniques, priorities are set for reducing the potential consequences of the hazards.
5-13
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Strategies to reduce or eliminate the hazards are then developed and implemented with clearly established
accountabilities. The situation is reassessed on a continuing basis and additional measures are
implemented as required.
Following are brief descriptions of the steps of the safety management process outlined in Figure 5-2.
a) Data collection. The first step in the safety management process is the acquisition of relevant
safety data – the evidence necessary to determine system safety performance or to identify latent
unsafe conditions (safety hazards). The data may derive from any corner of the system: the
equipment used, the people involved in the operation, their work procedures, their
human/equipment/procedures interactions, etc.
b) Analyse the data. By analysing all the pertinent information, safety hazards can be identified. The
conditions under which the hazards pose real risks, their potential consequences and the
likelihood of occurrence can be determined; in other words What can happen? How? and When?
This analysis can be both qualitative and quantitative. The inability to quantify and/or the lack of
historical data on a particular hazard do not exclude the hazard from the analysis.
c) Prioritize the unsafe conditions. The seriousness of hazards is determined by considering them
relative to each other, and against an agreed set of acceptability criteria. Those posing the greatest
risks are considered for safety action. This may require a cost-benefit analysis.
d) Develop strategies. Beginning with the highest priority risks, several options for managing the
risks may be considered, for example:
1) Spread the risk across as large a base of risk-takers as practicable. (This is the basis
of insurance.)
2) Transfer the risk (or liability) to some other organization or agency (such as through
lease vs. buy decisions).
3) Eliminate the risk entirely (possibly by ceasing that operation or practice).
4) Accept the risk, and continue operations unchanged.
5) Mitigate the risk by implementing measures to reduce the risk or at least facilitate
coping with the risk.
In selecting a risk management strategy, care is required to avoid introducing new (and
perhaps worse) risks. (See Chapter 6 for further guidance on risk management.)
e) Approve strategies. Having analyzed the risks and decided on an appropriate course of action,
management’s approval is required to proceed. The challenge here is the formulation of a
convincing argument for (perhaps expensive) change.
f) Assign responsibilities and implement strategies. Following the decision to proceed, the
“nuts and bolts” of implementation must be worked out. This includes a determination of
resource allocation, assignment of responsibilities, scheduling, revisions to operating
procedures, etc.
5-14
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
g) Re-evaluate situation. Implementation is seldom as successful as initially envisaged.
Feedback is required to close the loop. What new problems may have been introduced? How
well is the agreed strategy for risk reduction meeting performance expectations? What
modifications to the system or process may be required?
h) Collect additional data. Depending on the re-evaluation step, new information may be
required and the full cycle reiterated to refine the safety action.
Safety management requires analytical skills that may not be routinely practiced by management. The
more complex the analysis, the more important is the need for the application of the most appropriate
analytical tools. The closed loop process of safety management also requires feedback to ensure that
management can test the validity of their decisions and assess the effectiveness of their implementation.
(Chapter 9 provides guidance on safety analysis.)
Safety oversight
Safety oversight or safety performance monitoring activities are an essential component of the
organization’s overall safety management strategy. Safety oversight provides the means by which an
organization can verify how well it is fulfilling its safety objectives. An effective safety oversight
programme increases the probability of detecting any weaknesses in the system defences before an active
failure leads to an accident or serious incident. To do this, data must be collected and analysed.
Some of the requirements for a safety performance monitoring system will already be in place in many
States. For example, States would normally have published regulations relating to mandatory reporting of
accidents and incidents, in accordance with Annex 13 — Aircraft Accident and Incident Investigation,
and would have procedures in place for processing and investigating these reports.
Identifying weaknesses in the system defences requires more than just collecting retrospective data and
producing summary statistics. The underlying causes of reported occurrences are not necessarily
immediately apparent, therefore investigation of safety occurrence reports, and any other information
concerning possible hazards, should go hand in hand with safety performance monitoring.
The implementation of an effective safety oversight programme requires, that organizations:
a) Determine relevant safety performance indicators (see below);
b) Establish a safety occurrence reporting system;
c) Establish a system for the investigation of safety occurrences;
d) Develop procedures for the integration of safety data from all available sources; and
e) Develop procedures for the analysis of the data and the production of periodic safety
performance reports.
Chapter 10 – Safety Performance Monitoring provides guidance on the safety oversight function.
5-15
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Safety culture
Some organizations believe that they have adequately addressed “safety management” by appointing a
Director of Safety or a Flight Safety Manager. Often, this person is expected to “manage safety” and
“prevent accidents” without a clear set of objectives and priorities, with limited guidance about how to
do the work and a lack of resources to adequately undertake the task. Effective safety management is not
a single function carried out by a designated organizational element. It needs to be a “way of thinking”,
shared by all elements of the organization.
Effective safety management requires more than setting up an organizational structure and promulgating
rules or directives specifying the procedures to be followed. It requires a genuine commitment to safety
on the part of senior management, and an organizational culture such that staff at all levels are safety
conscious in their approach to their tasks.
There is a close correlation between the philosophy of safety management and the concept of a safety
culture. The philosophy defines a way of thinking about safety. The safety culture is a result of this way
of thinking being translated into actions, so that the organizational culture becomes safety oriented. In
essence, a positive safety culture makes safety “everybody’s business”. The safety culture will mature as
exposure to and experience with safety management increases. (The characteristics of an organization
with a positive safety culture were outlined in Chapter 4.)
Only by having an internal system with adequate safeguards, and a corporate culture which ensures that
these safeguards will not be circumvented, can acceptable levels of safety performance be assured. The
identification of hazards, implementation of safeguards and development of a safety-oriented
organizational culture can only be done from within an organization. An external body, such as the Safety
Regulator, can audit the system to see whether it complies with ICAO and national requirements, but it
cannot actively manage the day-to-day operation of the safety management system.
Safety performance indicators and targets
As described above, the safety management process is closed loop (like the accident prevention cycle
described earlier in the manual). The process requires feedback to provide a baseline for assessing the
system’s performance so that necessary adjustments can be made to effect the desired levels of safety.
This requires a clear understanding of how results are to be evaluated; what will be the quantitative or
qualitative indicators that will be employed to determine that the system is working? Having decided the
factors by which success can be measured, safety management implies the rational setting of specific
safety goals and objectives. For the purposes of this discussion, this Manual uses the following
terminology.
Safety performance indicator. A measure (or metric) used to express the level of safety
performance required or achieved in a system.
Safety performance target. The required level of safety performance for a system. A safety
performance target comprises one or more safety performance indicators, together with desired
outcomes expressed in terms of those indicators.
It is necessary to draw a distinction between the criteria used to assess operational safety performance
through monitoring, and the criteria used for the assessment of planned new systems or procedures. The
process for the latter is known as safety assessments which are discussed in Chapter 13.
5-16
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Safety performance indicators
In order to set safety performance targets, it is necessary to first decide on appropriate safety
performance indicators. Safety performance indicators are generally expressed in terms of the
frequency of occurrence of some event causing harm. Typical measures which could be used include:





aircraft accidents per flight hour;
aircraft accidents per movement;
aircraft accidents per year;
serious incidents per flight hour;
fatalities due to aircraft accidents per year.
There is no single safety performance indicator which is appropriate in all circumstances. The
indicator chosen to express a safety performance target must be matched to the application in which it
will be used, so that it will be possible to make a meaningful evaluation of safety in the same terms as
those used in defining the safety performance target.
The safety performance indicator(s) chosen to express global, regional and national targets will not
generally be appropriate for application to individual organizations. Since accidents are relatively rare
events, they do not provide a good indication of safety performance – especially at the local level.
Even at the global level, accident rates vary considerably from year to year. An increase or decrease
in accidents from one year to the next does not necessarily indicate a change in the underlying level
of safety.
The frequency of occurrence of certain types of incidents may provide a better indicator of the “safety
health” of the system. While these will not have resulted in an accident, they may reveal the existence
of weaknesses or flaws in the system which, given a slightly different set of circumstances, may have
contributed to an accident.
Another safety indicator might reflect the identification and reporting of hazards which may not yet
have resulted in either an accident or an incident. Such hazards may be identified through the safety
assessment of new or planned systems or operating procedures, through safety oversight programmes
or through occurrence reporting programmes (i.e. hazard or incident reporting systems).
Safety Performance Targets
Having decided on appropriate safety indicators, it is then necessary to decide on what represents an
acceptable outcome.
ICAO has set a global safety performance target in the objectives of the Global Aviation Safety Plan
(GASP). These are to:
a) reduce the number of accidents and fatalities, irrespective of the volume of air traffic; and
b) achieve a significant decrease in worldwide accident rates, placing the emphasis on regions
where these remain high.
5-17
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
The desired safety outcome may be expressed either in absolute or relative terms. ICAO’s global
targets are an example of relative targets. A relative target could also incorporate a desired percentage
reduction in accidents or particular types of safety occurrences in a defined time period.
In setting safety performance targets based on fatal aircraft accidents for use at the national or
regional level, use could be made of acceptability criteria for societal risks which have been
developed for other industries.
Safety Management Systems (SMS)
As stated at the outset, management of safety within an organization may embrace a broad range of
activities aimed at fulfilling the organization’s safety objectives. In managing safety in such large
systems, these diverse safety activities must be coordinated and integrated into a coherent safety
programme. Safety management systems (SMS) are created to integrate the organization’s policies,
procedures, responsibilities, organizational structures and safety activities in order to meet its safety
objectives.
Safety Management Systems have particular application in organizations devoted to flight operations,
aircraft engineering and maintenance, air traffic services and aerodrome operations. They help to support
several of the broad functions (outlined above) across the entire organization. Their principal value is in
the identification of both actual and potential systemic safety hazards. They are also useful in the
determination of adequate remedial action to maintain safety at an acceptable level of risk. In addition,
they facilitate continuous safety performance monitoring across the system.
In creating an SMS, the “system” to be managed must be carefully defined. An SMS provides a logical
means for establishing the boundaries and defining the objectives of the system. All the system’s various
components can be systematically reviewed, including the interactions among people, procedures, tools,
materials, equipment, facilities, software and the environment. All the necessary data to support the safety
analyses can be compiled, including the impact of applicable regulations, documented goals, objectives,
and performance specifications.
An effective SMS is becoming as vital to the survival of an aviation business as is an effective financial
management system. An SMS creates the corporate culture in which employees at all levels practice
safety in their assigned responsibilities and it makes safety “everybody’s business”, albeit with explicit
objectives and lines of accountability. Consistent with the systems approach to safety, an SMS must
include accountability for those suppliers, sub-contractors and business partners with the potential to
affect the company’s safety performance. The safety culture will mature as exposure to and experience
with safety management increases.
Generally the requirements, procedures and practices of an SMS are considered under the following
headings:
a) The organization’s safety policy;
b) The core safety management activities which are:




safety monitoring;
safety assessment;
safety auditing;
safety promotion; and
5-18
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
c) Its support elements including:




the organizational structure;
the role of the safety manager;
safety responsibilities and accountabilities; and
the training and competency of personnel.
The relationship between these various components of a safety management system are illustrated in Figure 53 below.
Safety
Culture
Safety
Monitoring
Philosophy of
Safety
Management
Safety
Policy
Safety
Assessment
Safety
Auditing
Maintenance or
Improvement of
safety performance
Safety
Promotion
Supporting
organizational
requirement
requirements
Safety Management
System SystemSSystem
Figure 5-3. The Concept of a Safety Management System
Each of these core safety management activities and their supporting elements are addressed in this
Manual. In particular, Part 3 provides a broader discussion of the concepts underlying safety management
systems and guidance on implementing and operating an effective safety management system.
Threat and Error Management (TEM)
Threat and error management (TEM) is an evolving concept tied in to effective safety management. TEM
builds upon a positive safety culture to provide a bridge between aviation operations and human
performance. The most important factor influencing human performance in dynamic work environments
is the real-world operational context (i.e., organizational, regulatory and environmental factors, etc.)
within which people discharge their operational duties. Since operational challenges and complexities
directly affect safety, TEM aims to provide a principled approach to the examination of this operational
context of human performance.
TEM provides a conceptual framework for understanding, from an operational perspective, the interrelationships between safety and human performance in dynamic operational contexts. TEM facilitates
5-19
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
both understanding human and system performance and quantifying the complexities of the operational
context in relation to this description. Potential areas of application for TEM include:
a) Investigating single accidents or incidents;
b) Understanding systemic patterns within a large set of events (such as operational audits);
c) Defining required competencies for actual operating contexts for licensing purposes;
d) Defining operational training requirements; etc.
The TEM considers three basic components listed below:
a) Threats which are circumstances beyond the control of operational personnel which complicate
fulfilment of their tasks (such as weather, traffic, equipment serviceability);
b) Errors by operational personnel that lead to deviations from expected performance (such as slips,
lapses and mistakes);
c) Undesired States or operational conditions where an unintended situation results in a reduction in
the margins of safety (such as climbing above an assigned altitude).
While these unsafe conditions may not always lead to an accident or incident, they require careful
management to remain within accepted margins of safety. A TEM model has evolved for the management
of such unsafe conditions. Although originally developed for flight deck operations, the TEM model
offers a conceptual basis for use across most functions in aviation with potential for impacting on
operational safety. The TEM model is described further in the précis at Appendix 2 to this Chapter.
Figure 5-4 provides a simplified graphic summary of the Threat and Error Management model.14
Figure 5-4. Simplified Version of Threat Error Management Model
————————
14
The full version of the TEM model can be found in the Human Factors Training Manual (Doc 9683) and in the ICAO Line
Operations Safety Audit (LOSA) Manual (Doc 9803)
5-20
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Appendix 1 to Chapter 5
THREE CORNERSTONES OF SAFETY MANAGEMENT 15
Effective safety management systems comprise three defining cornerstones. The characteristics for each
are outlined below:
a) A comprehensive corporate approach to safety which provides for such things as:
— Ultimate accountability for corporate safety is assigned to the Board of Directors and
Chief Executive Officer (CEO) with evidence of corporate commitment to safety from
the highest organizational levels;
— A clearly enunciated safety philosophy, with supporting corporate policies, including a
non-punitive policy for disciplinary matters;
— Corporate safety goals, with a management plan for meeting these goals;
— Well defined roles and responsibilities with specific accountabilities for safety published
and available to all personnel involved in safety;
— A requirement for an independent safety manager;
— Demonstrable evidence of a positive safety culture throughout the organization;
— Commitment to a safety oversight process which is independent of line management;
— A system of documentation of those business policies, principles, procedures and
practices with safety implications;
— Regular review of safety improvement plans; and
— Formal safety review processes.
b) Effective organizational tools for delivering on safety standards through such activities as:
— Risk-based resource allocation;
— Effective selection, recruitment, development and training of personnel;
— Implementation of Standard Operating Procedures (SOPs) developed in cooperation with
affected personnel;
— Corporate definition of specific competencies (and safety training requirements) for all
personnel with duties relating to safety performance;
15
CAP 712 Op. Cit.
5-21
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
— Defined standards for, and auditing of, asset purchases and contracted services;
— Controls for the early detection of - and action on - any deterioration in the performance
of safety-significant equipment, systems or services;
— Controls for monitoring and recording the overall safety standards of the organization;
— The application of appropriate hazard identification, risk assessment and effective
management of resources to control identified risks;
— Provision for the management of major changes in such areas as the introduction of new
equipment, procedures or types of operation, turnover of key personnel, mass layoffs or
rapid expansion, mergers and acquisitions;
— Arrangements enabling staff to communicate significant safety concerns to the
appropriate level of management for resolution and feedback on actions taken;
— Emergency response planning and simulated exercises to test the plan’s effectiveness;
and
— Assessment of commercial policies with regard to their impact on safety.
c) A formal system for safety oversight with such desirable elements as:
— A system for analysing flight recorder data for the purpose of monitoring flight
operations and for detecting unreported safety events;
— An organization-wide system for the capture of reports on safety events or unsafe
conditions;
— A planned and comprehensive safety audit review system which has the flexibility to
focus on specific safety concerns as they arise;
— A system for the conduct of internal safety investigations, the implementation of remedial
actions and the dissemination of such information to all affected personnel;
— Systems for the effective use of safety data for performance analysis and for monitoring
organizational change as part of the risk management process;
— Systematic review and assimilation of best safety practices from other operations;
— Periodic review of the continued effectiveness of the safety management system by an
independent body;
— Line managers’ monitoring of work in progress in all safety critical activities to confirm
compliance with all regulatory requirements, company standards and procedures, with
particular attention to local practices;
— A comprehensive system for documenting all applicable aviation safety regulations,
corporate policies, safety goals, standards, SOPs, safety reports of all kinds, etc. and for
making such documentation readily available for all affected personnel; and
5-22
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
— Arrangements for ongoing safety promotion based on measured internal safety
performance.
————————
5-23
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Appendix 2 to Chapter 5
Précis on the
THREAT ERROR MANAGEMENT MODEL
Threat error management is a pro-active method of identifying and dealing with developing unsafe
conditions in a dynamic operational situation, thereby preserving accepted margins of safety. As threats to
safety develop, as normal errors are committed by operational personnel and as the operational situation
degenerates into an undesired state, countermeasures are taken to contain the situation. However, failure
to appropriately deal with the situation may result in as incident or even an accident.
TEM components
There are three basic components of the TEM model: threats, errors and undesired states. An acceptable
level of safety can only be maintained by managing each of these components.
Threats may be defined as “events or errors that occur beyond the influence of the operational
personnel (flight crews, air traffic controllers, etc.) which increase operational complexity, and which
must be managed to maintain the margins of safety”.
Threats may involve such operational complexities as degraded equipment capabilities, meteorology,
topography, airspace congestion, etc. These threats must be managed because they all have the
potential to negatively affect operations by reducing margins of safety. The degree of threat warning
available may dictate the most appropriate action to manage the situation. For example:

Flight crews and air traffic controllers can prepare their response to expected thunderstorm
activity;

Skills and knowledge acquired through training and operational experience must be applied
in coping with unexpected threats such as sudden equipment malfunctions;

Safety analysis may reveal less obvious threats (such as equipment design issues, optical
illusions, or shortened turn-around schedules) which have the potential to compromise
operations.
Table 5-2 below presents examples of threats, grouped under two basic categories. The dynamic
environment in which operations take place creates both expected and unexpected environmental
threats which must be managed in real time. Organizational threats are usually latent in nature. Once
identified they can be controlled (i.e. reduced or eliminated) at source by the organization. However,
operational personnel remain the last line of defence if the organizational defences are inadequate.
5-24
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Table 5-2. Examples of threats (List not inclusive)
Environmental Threats
Weather: thunderstorms, turbulence, icing,
wind shear, cross/tailwind, very low/high
temperatures.
Organizational Threats
Operational pressure: delays, late arrivals,
equipment changes.
Aircraft: aircraft malfunction, automation
event/anomaly, MEL/CDL.
ATC: traffic congestion, TCAS RA, ATC
command, ATC error, ATC language difficulty,
ATC non-standard phraseology, ATC runway
change, ATIS communication, units of
measurement (QFE/meters).
Cabin: flight attendant error, cabin event
distraction, interruption, cabin door security.
Maintenance: maintenance event/error.
Airport: contaminated/short runway;
contaminated taxiway, lack of/confusing/faded
signage/markings, birds, aids U/S, complex
surface navigation procedures, airport
constructions.
Ground: ground handling event, de-icing, ground
crew error.
Dispatch: dispatch paperwork event/error.
Terrain: High ground, slope, lack of references,
“black hole”.
Documentation: manual error, chart error.
Other: crew scheduling event
Other: similar call-signs.
Errors are “actions or inactions by operational personnel that lead to deviations from their intended
or their organization’s expected performance”.
Unmanaged and/or mismanaged errors frequently lead to undesired states, thereby reducing the
margins of safety and increasing the probability of adverse events. Examples of errors include the
failure to maintain stabilized approach parameters, selection of an inappropriate equipment mode,
failure to make a required callout, or misunderstanding an ATC clearance. (Chapter 4 (Understanding
Safety) includes a discussion of human errors.)
Avoiding a potentially unsafe condition is dependent upon the detection and response to the error by
front-line personnel. TEM focuses on this detection and response process. Errors that are properly
managed in a timely manner become operationally inconsequential. On the other hand, mismanaged
errors induce additional errors or undesired states.
Table 5-3 presents examples of errors, grouped under three basic categories: aircraft handling errors,
procedural errors and communications errors. Errors are classified based upon the personnel primarily
involved in handling the equipment, executing the procedure or making the communication at the
moment the error is committed.
Errors may be unintentional or involve intentional non-compliance. Proficiency considerations (i.e.,
skill or knowledge deficiencies, training system deficiencies) may underlie many errors.
5-25
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Table 5-3. Examples of flight crew errors (List not inclusive)
Error Type
Aircraft handling errors
Examples
Manual handling/flight controls: vertical/lateral and/or speed deviations,
incorrect flaps/speedbrakes, thrust reverser or power settings.
Automation: incorrect altitude, speed, heading, autothrottle settings, incorrect
mode executed, or incorrect entries.
Systems/radio/instruments: incorrect packs, incorrect anti-icing, incorrect
altimeter, incorrect fuel switch settings, incorrect speed bug, incorrect radio
frequency.
Ground navigation: attempting to turn down wrong taxiway/runway, taxi too
fast, failure to hold short, missed taxiway/runway.
Procedural errors
SOPs: failure to cross-verify automation inputs.
Checklists: wrong challenge and response; items missed, checklist performed
late or at the wrong time.
Callouts: omitted/incorrect callouts.
Briefings: omitted briefings; items missed.
Documentation: wrong weight and balance, fuel information, ATIS, or
clearance information recorded, misinterpreted items on paperwork; incorrect
logbook entries, incorrect application of MEL procedures.
Communication errors
Crew to external: missed calls, misinterpretations of instructions, incorrect readback, wrong clearance, taxiway, gate or runway communicated.
Pilot to pilot: inter-crew miscommunication or misinterpretation
Undesired States are "operational conditions where an unintended situation results in a reduction in
margins of safety". Undesired states that result from ineffective threat and/or error management may
lead to compromising situations and reduce margins of safety in operations. Often considered the last
stage before an incident or accident, it is essential to effectively manage undesired states, restoring
desired margins of safety. Examples of undesired states would include an aircraft climbing or
descending to another level than it should, or an aircraft lining up for the incorrect runway during
approach and landing.
Table 5-4 presents examples of undesired aircraft states, grouped under three basic categories aircraft
handling, ground navigation, and aircraft configuration. Similar categories could be applied for other
areas of operations such as air traffic control.
5-26
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Table 5-4. Examples of undesired aircraft states (List not inclusive)
Source
Undesired State
Aircraft handling
Aircraft control (attitude).
Vertical, lateral or speed deviations.
Unnecessary weather penetration.
Unauthorized airspace penetration.
Operation outside aircraft limitations.
Unstable approach.
Continued landing after unstable approach.
Long, floated, firm or off-centreline landing.
Ground navigation
Proceeding towards wrong taxiway/runway.
Wrong taxiway, ramp, gate or hold spot.
Incorrect aircraft configurations
Incorrect systems configuration.
Incorrect flight controls configuration.
Incorrect automation configuration.
Incorrect engine configuration.
Incorrect weight and balance configuration.
Undesired states are different from outcomes. Undesired states are transitional states between a
normal operational state (such as a stabilized approach) and an outcome or end-state such as a
reportable occurrence. For example: a stabilized approach (normal operational state) turns into an
unstabilized approach (undesired aircraft state) that results in a runway excursion (outcome). As long
as the undesired state can be managed so as to recover normal operations, margins of safety can be
preserved. Once the undesired state progresses to an outcome, restoration of normal margins of safety
is not possible.
Countermeasures
As part of the normal discharge of their duties, personnel involved in line operations must employ
countermeasures to keep threats, errors and undesired states from reducing margins of safety. These
countermeasures may be based strictly on human performance. Others will require the application of
skills and knowledge in conjunction with the use of equipment. Enhanced TEM draws upon both
systemic and individual / team countermeasures.
Applying countermeasures to preserve margins of safety during operations requires significant amounts of
time and energy. For example, empirical evidence gathered through observations during training and
check flights suggest that as much as 70% of flight crew activities may be related to measures for
protecting against threats and errors. Examples would include the use of checklists, briefings, call-outs
and SOPs, as well as personal strategies and tactics.
Although all countermeasures are initiated by operational personnel (such as flight crews and air traffic
controllers), their response may build upon resources provided by the aviation system. These resources
5-27
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
are already in place in the system before operations begin. Following are some examples of systemic
countermeasures:

Standard operation procedures (SOPs);

Checklists;

Briefings;

Airborne Collision Avoidance System (ACAS);

Ground Proximity Warning System (GPWS);

Training; Etc .
Other countermeasures rely more on the skills, knowledge and attitudes developed through human
performance training and operational experience. These involve personal strategies and tactics as well as
team countermeasures (such as Crew or Team Resource Management (CRM or TRM). There are
basically three categories of individual and team countermeasures:
1) Planning countermeasures: essential for managing anticipated and unexpected threats;
2) Execution countermeasures: essential for error detection and error response;
3) Review countermeasures: essential for managing the changing conditions of a flight.
Table 5–5 below presents examples of individual / team countermeasures for each category.16
Table 5–5. Examples of individual and team countermeasures
Planning countermeasures
The required briefing was interactive
and operationally thorough
- Concise, not rushed, and met SOP
requirements
- Bottom lines were established
PLANS STATED
Operational plans and decisions were
communicated and acknowledged
- Shared understanding about plans “Everybody on the same page”
WORKLOAD
ASSIGNMENT
Roles and responsibilities were
defined for normal and non-normal
situations
- Workload assignments were communicated
and acknowledged
CONTINGENCY
MANAGEMENT
Crew members developed effective
strategies to manage threats to safety
- Threats and their consequences were
anticipated
- Used all available resources to manage
threats
SOP BRIEFING
16
Further guidance on countermeasures can be found in the sample assessment guides for terminal training objectives (PANS TRG, ------) as well as in the ICAO Line Operations Safety Audit (LOSA) Manual (Doc 9803).
5-28
Part 2
DRAFT
Chapter 5 - Basics of Safety Management
Execution countermeasures
MONITOR /
CROSS-CHECK
WORKLOAD
MANAGEMENT
AUTOMATION
MANAGEMENT
Crew members actively
monitored and cross-checked
systems and other crew
members
Operational tasks were
prioritized and properly
managed to handle primary
flight duties
- Aircraft position, settings, and crew actions
were verified
- Avoided task fixation
- Did not allow work overload
Automation was properly
managed to balance situational
and/or workload requirements
- Automation set-up was briefed to other
members
- Effective recovery techniques from automation
anomalies
Review countermeasures
Existing plans were reviewed
and modified when necessary
- Crew decisions and actions were openly
analyzed to make sure the existing plan was the
best plan
INQUIRY
Crew members asked questions
to investigate and/or clarify
current plans of action
- Crew members not afraid to express a lack of
knowledge - “Nothing taken for granted”
attitude
ASSERTIVENESS
Crew members stated critical
information and/or solutions
with appropriate persistence
- Crew members spoke up without hesitation
EVALUATION/
MODIFICATION
OF PLANS
____________________
5-29
Part 2
DRAFT
Chapter 6
RISK MANAGEMENT
Outline
General
Hazard Identification
Risk Assessment
 Problem definition
 Probability of adverse consequences
 Severity of consequences of occurrence
 Risk acceptability
Risk Mitigation
 Defence analysis
 Risk mitigation strategies
 Brainstorming
 Evaluating risk mitigation options
Risk Communication
Risk Management Considerations for State Administrations
 Occasions warranting risk management by State
 Communications by State administrations
 Benefits of risk management by State administrations
Chapter 6 - Risk Management
Part 2
DRAFT
Chapter 6 - Risk Management
This page intentionally left blank.
6-2
Part 2
DRAFT
Chapter 6 - Risk Management
Chapter 6
RISK MANAGEMENT
Risk management serves to focus safety efforts on those hazards
posing the greatest risks.
General
The aviation industry faces a diversity of risks everyday, many capable of compromising the viability of
an operator and some even posing a threat to the industry. Indeed, risk is a by-product of doing business.
Not all risks can be eliminated, nor are all conceivable risk mitigation measures economically feasible.
The risks and costs inherent in aviation necessitate a rational process for decision-making. Daily,
decisions are made in real time, weighing the probability and severity of any adverse consequences
implied by the risk against the expected gain of taking the risk. This process is known as risk
management. For the purposes of this manual risk management can be defined as:
Risk Management: the identification, analysis and elimination (and/or mitigation to an acceptable
level) of those hazards, as well as the subsequent risks that threaten the viability of an organization.
In other words, risk management facilitates the balancing act between assessed risks and viable risk
mitigation. Risk management is an integral component of safety management. It involves a logical
process of objective analysis, particularly in the evaluation of the risks. However, in striving for the
highest level of safety, it must be accepted that absolute safety is unachievable.
An overview of the process for Risk Management is summarized in the flow chart at Figure 6-1 below.
6-3
Part 2
DRAFT
Chapter 6 - Risk Management
Identify the hazards to equipment,
property, personnel or the organization.
HAZARD
IDENTIFICATION
Evaluate the seriousness of the
consequences of the hazard occurring.
RISK ASSESSMENT
Severity / Criticality
What are the chances of it happening?
RISK ASSESSMENT
Probability of
Occurrence
RISK ASSESSMENT
Acceptability
Is the consequent risk acceptable and
within the organization’s
safety performance criteria?
YES
NO
Take action to reduce
Accept the risk
the risk to
an acceptable level
RISK MITIGATION
Figure 6-1. Risk management process
As Figure 6-1 indicates, risk management comprises three essential elements: hazard identification, risk
assessment and risk mitigation. The concepts of risk management have equal application in the decisionmaking by management in flight operations, air traffic control, maintenance, airport management,
manufacturer (e.g. life of components) and State administrations.
HAZARD IDENTIFICATION
The concept of hazard identification was introduced in Chapter 5 (Basics of Safety Management). Given
that a hazard may involve any situation or condition that has the potential to cause adverse consequences,
the scope for hazards in aviation is wide. For example:
a) Design factors, including equipment and task design;
b) Procedures and operating practices, including their documentation and checklists, and their
validation under actual operating conditions;
6-4
Part 2
DRAFT
Chapter 6 - Risk Management
c) Communications, including the most suitable medium, terminology and such barriers to effective
communications as language;
d) Personnel factors, such as company policies for recruitment, training, crew scheduling and
remuneration;
e) Organizational factors, such as the compatibility of production and safety goals, the allocation of
resources, operating pressures, effectiveness of internal communications and the corporate safety
culture;
f) Work environment factors, such as ambient noise and vibration, temperature, lighting and the
availability of protective equipment and clothing;
g) Regulatory oversight factors, including the applicability and enforceability of regulations; the
certification of equipment, personnel and procedures; and the adequacy of surveillance audits;
and
h) Defences include such factors as the provision of adequate detection and warning systems, the
error tolerance of equipment and the extent to which the equipment is hardened against failures.
As seen earlier, hazards may be recognized through actual safety events (accidents or incidents), or they
may be identified through proactive processes aimed at identifying hazards before they precipitate an
occurrence. In practice, both reactive measures and proactive processes provide an effective means of
identifying hazards for accident prevention.
Reported safety events are clear evidence of problems in the system and therefore, provide an opportunity
to learn valuable safety lessons. Safety events should therefore be investigated to identify the hazards
putting the system at risk. This involves investigating all the factors, including the organizational and
human factors that played a role in the event. Guidance for investigating safety events for accident
prevention is included in Chapter 8 (Safety Investigations).
Organizations with safety management systems will also use proactive processes for the identification of
hazards. Methods of identifying hazards proactively include:
a) Safety assessments;
b) Trend monitoring and analysis of safety data;
c) Analysis of incident reports, maintenance service difficulty reports, etc;
d) Analysis of operational data from flight recorders, line operational safety audits, etc.;
e) Safety surveys of employees; and
f) Safety inspections and audits; etc.
Several proactive methods of hazard identification are discussed in Part 4 – Applied Safety Management
In a mature safety management system, hazard identification should arise from several sources as an ongoing process. However, there are times in an organization’s life that special attention to hazard
identification is warranted on a system-wide basis. Safety assessments (which are discussed in
6-5
Part 2
DRAFT
Chapter 6 - Risk Management
Chapter 13) provide a structured and systemic process for hazard identification when:
a) There is an unexplained increase in safety-related events or safety infractions;
b) Major operational changes are planned, including changes to key personnel, routes, aircraft types
and other major equipment or systems, etc.;
c) The organization is undergoing significant change, such as rapid growth or contraction; or
d) Corporate mergers, acquisition or downsizing is planned.
Having identified a hazard suspected of posing a risk to the organization, the characteristics of the unsafe
condition need to be defined. This requires considering factors such as:
a) Extent to which the hazard exists in the organization (or aviation system). For example, is it
limited to a particular group of personnel, type of equipment or operation, location?
b) Adequacy of existing hazard control measures (or defences)? and
c) Extent to which the identified hazard is already being addressed?
Such questions help define the characteristics of the problem and reduce the likelihood of initiating
premature or inappropriate corrective action.
RISK ASSESSMENT
Having confirmed the presence of a safety hazard, some form of analysis is required to assess its potential
for harm or damage. Typically, this assessment of the hazard involves three considerations:
a) The probability of the hazard precipitating an unsafe event (i.e. the probability of adverse
consequences should the underlying unsafe conditions be allowed to persist);
b) The severity of the potential adverse consequences, or the outcome of such an unsafe event; and
c) The rate of exposure to the hazards. The probability of adverse consequences increases through
increased exposure to the unsafe conditions (thus, exposure may be viewed as another dimension
of probability).
Risk is the assessed potential for adverse consequences resulting from a hazard. It is the probability that
during a defined period of activity, the hazard will result in an accident with definable consequences. It is
the likelihood that the hazard’s potential to cause harm will be realized.
Risk assessment involves consideration of both the probability and the severity of any adverse
consequences; in other words, the loss potential is determined. In carrying out risk assessments, it is
important to distinguish between hazards (the potential to cause harm) and risk (the likelihood of that
harm being realized in a specified period of time). A risk assessment matrix (such as that provided in
Table 6–1 (below) is a useful tool for prioritizing the hazards most warranting attention.
There are many ways to approach the analytical aspects of risk assessment — some more formal than
others. For some risks, the number of variables and the availability of both suitable data and mathematical
6-6
Part 2
DRAFT
Chapter 6 - Risk Management
models may lead to credible results with quantitative methods (requiring mathematical analysis of
specific data). However, few hazards in aviation lend themselves to credible analysis solely through
numerical methods. Typically, these analyses are supplemented qualitatively through critical and logical
analysis of the known facts and their relationships.
Considerable literature is available on the types of analysis used in risk assessment. Chapter 9 (Safety
Analysis and Studies) includes some common analytical methods, including statistical analytical
techniques and analytical tools for assessing potential hazards. For the risk assessments discussed in this
manual, sophisticated methods are not required; a basic understanding of a few methods will suffice.
Whatever methods are used, there are various ways by which risks may be expressed. For example:
a) Number of deaths, loss of revenue, or loss of market share (i.e.. absolute numbers);
b) Loss rates (e.g. number of fatalities per 1,000,000 seat miles flown);
c) Probability of serious accidents (e.g. 1 every 50 years);
d) Severity of outcomes (for example, injury severity); and
e) Expected dollar value of losses vs. annual operating revenue (e.g. $1 million loss per $200
million revenue).
Problem definition
In any analytical process, the problem must first be defined. In spite of identifying a perceived hazard,
defining the characteristics of the hazard into a problem for resolution is not always easy. People from
different backgrounds and experience will likely view the same evidence from different perspectives.
Why, something which poses a significant risk will reflect these different backgrounds, exacerbated by
normal human biases. Thus, engineers will tend to see problems in terms of engineering deficiencies;
medical doctors as medical deficiencies, and psychologists as behavioural problems, etc. The anecdote in
the following box exemplifies the multifaceted nature of defining a problem.
6-7
Part 2
DRAFT
Chapter 6 - Risk Management
Charlie’s Accident
Charlie has an emotional argument with his wife and proceeds to the local bar where he
consumes several drinks. He departs the bar in his car at high speed. Minutes later, he loses
control on the highway and is fatally injured. We know WHAT happened; we must now determine
WHY it happened.
The investigation team is comprised of six specialists, each of whom has a completely different
perspective on the root safety deficiency.
The sociologist identifies a breakdown in interpersonal communications within the marriage. An
enforcement officer from the Liquor Control Board notes the illegal sale of alcoholic beverages by
the bar on a "two for one" basis. The pathologist determines that Charlie's blood alcohol was in
excess of the legal limit. The highway engineer finds inadequate road banking and protective
barriers for the posted speed. An automotive engineer determines that Charlie's car had a loose
front end and bald tires. The policeman determines that the automobile was traveling at
excessive speed for the prevailing conditions.
Each of these different perspectives may result in a different definition of the underlying hazard.
Any or all of the factors cited in this example may be valid, underlining the nature of multi-causality. But,
how the safety issue is defined will affect the course of action taken to reduce or eliminate the hazards. In
assessing the risks, all potentially valid perspectives must be evaluated and only the most suitable
pursued.
Probability of adverse consequences
Regardless of the analytical methods used, the probability of causing harm or damage must be assessed.
This probability will depend on answers to such questions as:
a) Is there a history of occurrences like this, or is this an isolated occurrence?
b) How many other aircraft, or components of this type, might have similar defects?
c) How many operating or maintenance personnel are following, or are subject to, the procedures in
question?
d) What percentage of the time is the suspect equipment, or the questionable procedure, in use?, and
e) To what extent are there organizational, management or regulatory implications that might reflect
larger threats to public safety?
Based on such considerations, the likelihood of an event occurring can be assessed. For example:
a) Unlikely to occur. Failures that are “unlikely to occur” include isolated occurrences, and risks
where the exposure rate is very low or the fleet size is small. The complexity of the
circumstances necessary to create an accident situation may be such that it is unlikely such a
6-8
Part 2
DRAFT
Chapter 6 - Risk Management
chain of events will arise again. For example, it is unlikely that independent systems would
fail concurrently. However, even if the possibility is only remote, the consequences of such
concurrent failures may warrant follow-up.
There is a natural tendency to attribute unlikely events to “coincidence”. Caution is advised.
While coincidence may be statistically feasible, coincidence must not be used as an excuse
for the absence of due analysis.
b) May occur. Failures that “may occur” derive from hazards with a reasonable probability that
similar patterns of human performance can be expected under similar working conditions, or
that the same material defects exist elsewhere in the system.
c) Probably will occur. Such occurrences reflect a pattern (or potential pattern) of material
failures that have not yet been rectified. Given the design or maintenance of the equipment,
its strength under known operating conditions, etc., continued operations will likely lead to
failure. Similarly, given the empirical evidence on some aspect of human performance, it can
be expected with some certainty that normal individuals, operating under similar working
conditions, would likely commit the same errors or be subject to the same undesirable
performance outcome.
Severity of consequences of occurrence
Having determined the probability of occurrence, the nature of the adverse consequences if the event does
occur must be assessed. The potential consequences govern the degree of urgency attached to the safety
action required. If there is significant risk of catastrophic consequences, or if the risk of serious injury,
property or environmental damage is high, urgent follow-up action is warranted. In assessing the severity
of the consequences of occurrence, the following types of questions could apply:
a) How many lives are at risk? Employees, fare paying passengers, bystanders or the general
public;
b) What is the likely extent of property or financial damage? Direct property loss to the operator,
damage to aviation infrastructure, third party collateral damage, financial impact and economic
impact for the State;
c) What is the likelihood of environmental impact? Spill of fuel or other hazardous product and
physical disruption of natural habitat; and
d) What is the likely political implications and/or media interest?
Risk acceptability
Based on the risk assessment, the risks can be prioritized relative to other, unresolved safety hazards. This
is critical in making rational decisions to allocate limited resources against those hazards posing the
greatest risks to the organization.
Prioritizing risks requires a rational basis for ranking one risk vis à vis other risks. Criteria or standards
are required to define what is an acceptable risk and what is an unacceptable risk. By weighing the
likelihood of an undesirable outcome against the potential severity of that outcome, the risk can be
6-9
Part 2
DRAFT
Chapter 6 - Risk Management
categorized within a risk assessment matrix. Many versions of risk assessment matrices are available from
the literature. While the terminology or definition used for the different categories may vary, such tables
generally reflect the ideas summarized in the Table below:
Table 6-1. Risk assessment matrix17
SEVERITY OF CONSEQUENCES
Aviation
definition
Meaning
LIKELIHOOD OF OCCURRENCE
Value
Qualitative
definition
Meaning
Catastrophic
Aircraft destroyed
Multiple deaths
5
Frequent
Likely to occur many
times
5
Hazardous
A large reduction in
safety margins, physical
distress or a workload
that the crew cannot be
relied upon to perform
their tasks accurately or
completely. Serious
injury or death of a
proportion of the
occupants. Major
aircraft damage.
4
Occasional
Likely to occur some
times
4
Major
A significant reduction
in safety margins, a
reduction in the ability
of the flight crew to
cope with adverse
operating conditions as a
result of increase in
workload or as a result
of conditions impairing
their efficiency.
Aircraft serious incident.
Injury to occupants.
3
Remote
Unlikely, but possible
to occur
3
Minor
Nuisance. Operating
limitations. Use of
emergency procedures.
Aircraft minor incident.
2
Improbable
Very unlikely to occur
2
Negligible
Little consequence
1
Extremely
improbable
Almost inconceivable
that the event will
occur
1
In this version of a risk assessment matrix:
17
Value
Based on: An Introduction to Flight Safety Risk Assessment (Paper by Richard Profit UK CAA)
6-10
Part 2
DRAFT
Chapter 6 - Risk Management
a) Severity of risk is ranked as Catastrophic, Hazardous, Major, Minor or Negligible with a
descriptor for each indicating the potential severity of consequences. As mentioned earlier, other
definitions can be used, reflecting the nature of the operation being analysed;
b) Probability (or Likelihood) of occurrence is also ranked through five different levels of
qualitative definition and descriptors are provided for each likelihood of occurrence; and
c) Values may be assigned numerically, to weigh the relative importance of each level of severity
and probability. A composite assessment of risk, to assist in comparing risks, may then be derived
by multiplying the severity and probability values.
Having used a risk matrix to assign values to risks, a range of values may be assigned in order to
categorize risks as acceptable, undesirable or unacceptable.
a) Acceptable means that no further action needs to be taken, (unless the risk can be reduced further
at little cost or effort).
b) Undesirable (or tolerable) means that the affected persons are prepared to live with the risk in
order to have certain benefits, in the understanding that the risk is being mitigated as best as
possible.
c) Unacceptable means that operations under the current conditions must cease until the risk is
reduced to at least the Tolerable level.
A less numeric approach to determining the acceptability of particular risks includes consideration of
such factors as:
a) Managerial. Is the risk consistent with the organization’s safety policy and standards?
b) Affordability. Does the nature of the risk defy cost-effective resolution?
c) Legal. Is the risk in conformance with current regulatory standards and enforcement capabilities?
d) Cultural. How will the organization’s personnel and other stakeholders view this risk?
e) Market. Will the company’s competitiveness and well-being vis-à-vis other companies be
compromised by not reducing or eliminating this risk?
f) Political. Will there be a political price to pay for not reducing or eliminating this risk?
g) Public. How influential will the media or special interest groups be in affecting public opinion
regarding this risk?
RISK MITIGATION
As stated earlier, where risk is concerned, there is no such thing as absolute safety. Risks have to be
managed to a level “as low as reasonably practicable” (ALARP18). This means that the risk must be
18
From SRG SMS Policies and Guidelines Document: Safety Management Systems (Sponsor PS Griffith, Head A&ATSD) p.
7
6-11
Part 2
DRAFT
Chapter 6 - Risk Management
balanced against the time, cost and difficulty of taking measures to reduce or eliminate the risk.
When the acceptability of the risk has been found to be Undesirable or Unacceptable, control measures
need to be introduced — the higher the risk, the greater the urgency. The level of risk can be reduced
either by reducing the severity of the potential consequences, by reducing the likelihood of occurrence, or
by reducing the exposure to that risk.
Safety action is required to mitigate the risks. Optimal solutions will vary, depending on the local
circumstances and exigencies. In formulating meaningful safety action, a good understanding of the
adequacy of existing defences is required.
Defence analysis19
A major component of any safety system is the defences put in place to protect people, property or the
environment. These defences can be used to:
a) Reduce the probability of unwanted events occurring; and
b) Reduce the severity of the consequences associated with any unwanted events.
Defences can be categorized into two types:
a) Physical defences include objects that discourage or prevent inappropriate action, or which
mitigate the consequences of events (for example, squat switches, switch covers, firewalls,
survival equipment, warnings and alarms); and
b) Administrative defences include procedures and practices that mitigate the probability of an
accident (for example, safety regulations, standard operating procedures, supervision and
inspection, and personal proficiency).
Before selecting appropriate risk mitigation strategies, it is important to understand why the existing
system of defences was inadequate. The following line of questioning may pertain:
a) Were defences provided to protect against such hazards?
b) Did the defences function as intended?
c) Were the defences practical for use under actual working conditions?
d) Were affected staff aware of the risks and the defences in place? and
e) Are additional risk mitigation measures required?
Risk mitigation strategies
There is a range of strategies available for risk mitigation. For example:
19
From TSB’s Integrated Safety Investigation Methodology
6-12
Part 2
DRAFT
Chapter 6 - Risk Management
a) Exposure avoidance. The risky task, practice, operation or activity, is avoided because the risk
exceeds the benefits.
b) Loss reduction. Activities are taken to reduce the frequency of the unsafe events or the magnitude
of the consequences.
c) Segregation of exposure (separation or duplication). Action is taken to isolate the effects of the
risk or build in redundancy to protect against the risks, i.e. reduce the severity of the risk. For
example, protecting against collateral damage in the event of a material failure, or providing
back-up systems to reduce the likelihood of total system failure.
Brainstorming
Generating the ideas necessary to create suitable risk mitigation measures poses a challenge. Developing
risk mitigation measures frequently requires creativity, ingenuity and above all an open mind to consider
all possible solutions. The thinking of those closest to the problem (usually with the most experience) is
often coloured by set-ways and natural biases. Broad participation, including representatives of the
various stakeholders, tends to help overcome rigid mind-sets. Thinking “outside the box” is essential to
effective problem solving in a complex world. All new ideas should be weighed carefully before rejecting
any of them.
Evaluating risk mitigation options
In evaluating alternatives for risk mitigation, not all are of equal potential for reducing risks. The
effectiveness of each option needs to be evaluated before a decision can be taken. It is important that the
full range of possible control measures is considered and that trade-offs between measures are considered
to find an optimal solution. Each proposed risk mitigation option should be examined from such
perspectives as:
a) Effectiveness. Will it reduce or eliminate the identified risks? To what extent do alternatives
mitigate the risks? Effectiveness can be viewed as being somewhere along a continuum, as
follows:
1) Level One: (Engineering actions) The safety action will eliminate the risk, for example,
interlocks to prevent thrust reverser activation in-flight;
2) Level Two: (Control actions) The safety action accepts the risk but adjusts the system to
mitigate the risk by reducing it to a manageable level, such as, imposing more restrictive
operating conditions; and
3) Level Three: (Personnel actions) The actions taken accept that the hazard can neither be
eliminated (Level One) nor controlled (Level Two), so personnel must be taught how to
cope with it, such as, the addition of a warning, revised check list and extra training.
b) Cost / benefit. Do the perceived benefits of the option outweigh the costs? Will the potential
gains be proportional to the impact of the change required?
c) Practicality. Is it doable and appropriate in terms of, available technology, financial feasibility,
administrative feasibility, governing legislation and regulations, political will, etc.?
6-13
Part 2
DRAFT
Chapter 6 - Risk Management
d) Challenge. Can the risk mitigation measure withstand critical scrutiny from all stakeholders?
(Employees, managers, stockholders/State administrations, etc.)
e) Acceptability to each stakeholder. How much buy-in (or resistance) from stakeholders can be
expected? (Discussions with stakeholders during the Risk Assessment phase may indicate their
preferred risk mitigation option.)
f) Enforceability. If new rules (SOPs, regulations, etc.) are implemented, are they enforceable?
g) Durability. Will the measure withstand the test of time? Will it be of temporary benefit or will it
have long-term utility?
h) Residual Risks. After the risk mitigation measure is implemented, what will be the residual risks
relative to the original hazard? What is the ability to mitigate any residual risks?
i)
New Problems. What new problems, or new (perhaps worse) risks, will be introduced by the
proposed change?
Obviously, preference should be given to corrective actions that will completely eliminate the risk.
Regrettably, such solutions are often the most expensive. At the other end of the spectrum, when
insufficient resources or organizational will are present, the problem is often deferred to the training
department to teach staff to cope with the risks. In such cases, management may be avoiding hard
decisions by delegating responsibility for the risk to subordinates.
RISK COMMUNICATION
Risk communication includes any exchange of information about risks, i.e. any public or private
communication that informs others about the existence, nature, form, severity or acceptability of risks.
The information needs of the following groups may require special attention:
a) Management must be apprised of all risks which present loss potential to the organization;
b) Those exposed to the identified risks must be apprised of their severity and likelihood of
occurrence;
c) Those who identified the hazard need feedback on action proposed;
d) Those affected by any planned changes need to be apprised of both the hazards and the rationale
for the action taken; and
e) Others with potential information needs regarding specific risks include:
1) Regulatory authorities;
2) Suppliers;
3) Industry associations; and
4) General public, etc.
6-14
Part 2
DRAFT
Chapter 6 - Risk Management
Effective communication of the risks (and plans for their resolution) adds value to the Risk Management
process. The stakeholders can assist the decision-maker(s) if the risks are communicated early in a fair,
objective and understandable way. Thus, risk communications should be:
a) Two-way;
b) Factual concerning the risks, benefits and uncertainties, and take into account the needs and
concerns of the various stakeholders;
c) Tested for acceptability by the decision-maker(s) and stakeholders; and
d) Documented to reduce misunderstandings.
Failure to communicate the safety lessons learned in a clear and timely fashion will undermine
management’s credibility in promoting a positive safety culture. For safety messages to be credible, they
must be consistent with the facts, with previous statements from management and with the messages from
other authorities. These messages need to be framed in terms the stakeholders understand.
RISK MANAGEMENT CONSIDERATIONS FOR STATE ADMINISTRATIONS20
Risk management techniques have implications for State administrations in areas ranging from policy
development through to the “go/no-go” decisions confronting front-line State civil aviation inspectors.
For example:
a) Policy: To what extent should a State accept the certification paperwork of another State;
b) Regulatory Change: From the many (often-conflicting) recommendations made for regulatory
change, how are decisions made;
c) Priority Setting: How are decisions made for determining those areas of safety warranting
emphasis during safety oversight audits;
d) Operational Management: How are decisions made when insufficient resources are available to
carry out all planned activities; and
e) Operational Inspections: At the front line, how are decisions made when critical errors are
discovered outside of normal working hours.
Occasions warranting risk management by State administrations
Some situations should alert State aviation administrations to the possible need for applying risk
management methods, for example:
a) Start-up or rapidly expanding companies;
b) Corporate mergers;
20
Adapted from TP 13095 Risk Management & Decision Making, Transport Canada
6-15
Part 2
DRAFT
Chapter 6 - Risk Management
c) Companies facing bankruptcy or other financial difficulties;
d) Companies facing serious labour-management difficulties;
e) Introduction of major new equipment by an operator;
f) Certification of a new aircraft type, new airport, etc.;
g) Introduction of new communication, navigation or surveillance equipment and procedures; and
h) Significant change to air regulations or other laws potentially impacting on aviation safety, etc.
Risk management by State administrations will be affected by such factors as:
a) Time available to make the decision (grounding an aircraft, revoking a certificate, etc.);
b) Resources available to effect the necessary actions;
c) Numbers of people affected by required actions (company-wide, fleet-wide, regional, national,
international, etc.);
d) The potential impact of the State’s decision for action (or inaction); and
e) Cultural and political will to take the action required.
Communications by State administrations
Effective communications by State Administrations are an important part of the risk management process.
Establishing and maintaining good communications with stakeholders paves the way for effective
decision-making. Stakeholders’ perceptions about risk will often differ from those of the regulatory
authority.
Communicating the reasons for public safety decisions requires care because such decisions often arouse
strong emotions. Gaining acceptance of planned changes is more likely when key stakeholders are
involved in the decision-making process. Such action reflects the safety culture of the State
administration; is it one of management by decree, or one which values openness, respect, consultation,
collaboration, partnership, etc.
Benefits of risk management for State administrations
Applying risk management techniques in decision-making offers benefits for State administrations,
including:
a) Avoiding costly mistakes during the decision-making process;
b) Ensuring all aspects of the risk are identified and considered when making decisions;
c) Ensuring the legitimate interests of affected stakeholders are considered;
6-16
Part 2
DRAFT
Chapter 6 - Risk Management
d) Providing decision-makers with a solid defence in support of decisions;
e) Making decisions easier to explain to stakeholders and the general public; and
f) Providing significant savings in time and money.
____________________
6-17
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
Chapter 7
HAZARD AND INCCIDENT REPORTING
Outline
Introduction to Reporting Systems
 Value of safety reporting systems
 ICAO requirements
Types of Incident Reporting Systems
 Mandatory incident reporting systems
 Voluntary incident reporting systems
 Confidential reporting systems
Principles of Effective Incident Reporting
 Trust
 Non-punitive
 Inclusive reporting base
 Independence
 Ease of reporting
 Acknowledgment
 Promotion
International Incident Reporting Systems
 ICAO Accident/Incident Reporting System (ADREP)
 European Co-ordination Centre for Aviation Incident Reporting Systems (ECCAIRS)
State Voluntary Incident Reporting Systems
 Aviation Safety Reporting System (ASRS)
 Confidential Human Factors Incident Reporting Programme (CHIRP)
Company Reporting Systems
Implementation of Incident Reporting Systems
 What to report?
 Who should report?
 System management
 Reporting method and format
 Limitations on the use of incident data
Appendix
1. Limitations on the use of data from voluntary incident reporting systems
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
This page intentionally left blank.
7-2
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
Chapter 7
HAZARD AND INCIDENT REPORTING
INTRODUCTION TO REPORTING SYSTEMS
As outlined in previous chapters, safety management systems involve the reactive and proactive
identification of safety hazards. Accident investigations reveal a great deal about safety hazards; but
fortunately, aviation accidents are rare events. They are, however, generally investigated more thoroughly
than incidents. When safety initiatives rely exclusively on accident data, the limitations of small samples
apply. As a result, the wrong conclusions may be drawn, or inappropriate corrective actions taken.
Research leading to the 1:600 Rule showed that the number of incidents is significantly greater than the
number of accidents for comparable types of occurrences. The causal and contributory factors associated
with incidents may also culminate in accidents. Often, only good fortune prevents an incident from
becoming an accident. Unfortunately, these incidents are not always known to those responsible for
reducing or eliminating the associated risks. This may be due to the unavailability of reporting systems, or
people not being sufficiently motivated to report incidents.
Value of safety reporting systems
Recognizing that knowledge derived from incidents could provide significant insights into safety hazards,
several types of incident reporting systems have been developed. Some safety databases contain a large
quantity of detailed information. Although these occurrences may not be investigated to any depth, the
anecdotal information they provide can offer meaningful insight into the perceptions and reactions of
pilots, cabin attendants, mechanics and air traffic controllers.
Safety reporting systems should not just be restricted to incidents, but should include provision for the
reporting of hazards, i.e. unsafe conditions which have not yet caused an incident. For example, some
organizations have programmes for reporting conditions deemed unsatisfactory from the perspective of
experienced personnel (Unsatisfactory Condition Reports for potential technical faults). In some States,
Service Difficulty Reporting (SDR) systems are effective in identifying airworthiness hazards.
Aggregating data from such hazard and incident reports provides a rich source of experience to support
other safety management activities.
Data from incident reporting systems can facilitate an understanding of the causes of hazards, help define
intervention strategies, and the effectiveness of interventions. Depending on the depth to which they are
investigated, incidents can provide a unique means of obtaining first-hand evidence on the factors
associated with mishaps from the participants themselves. Reporters can describe the relationships
between stimuli and their actions. They may provide their interpretation of the effects of various factors
affecting their performance, such as fatigue, interpersonal interactions and distractions. Furthermore,
many reporters are able to offer valuable suggestions for remedial action. Incident data have also been
used to improve operating procedures, display and control design, and provide a better understanding of
human performance associated with the operation of aircraft and air traffic control.
7-3
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
ICAO requirements21
ICAO requires that States establish a mandatory incident reporting system to facilitate collection of
information on actual or potential safety deficiencies. In addition, States are encouraged to establish a
voluntary incident reporting system, adjusting their laws, regulations and policies so that the voluntary
programme:
a) Facilitates the collection of information that may not be captured by a mandatory incident
reporting system;
b) Is non-punitive; and
c) Affords protection to the sources of the information.
TYPES OF INCIDENT REPORTING SYSTEMS
In general, an incident involves an unsafe, or potentially unsafe, occurrence or condition that does not
involve serious personal injury or significant property damage; that is, it does not meet the criteria for an
accident, but could have. When an incident occurs, the individual(s) involved may or may not be required
to submit a report. The reporting requirements vary with the laws of the State where the incident
occurred. Even if not required by law, operators may require reporting of the occurrence to the company.
Mandatory incident reporting system
In a mandatory system, people are required to report certain types of incidents. This necessitates detailed
regulations outlining who shall report and what shall be reported. The number of variables in aircraft
operations is so great that it is difficult to provide a comprehensive list of items or conditions which
should be reported. For example, loss of a single hydraulic system on an aircraft with only one such
system is critical. On a type with three or four systems, it may not be. A relatively minor problem in one
set of circumstances can, in different circumstances, result in a hazardous situation. However, the rule
should be: “If in doubt — report it”.
Because mandatory systems deal mainly with “hardware” matters, they tend to collect more information
on technical failures than on the Human Factor aspects. To help overcome this problem, States with
well-developed mandatory reporting systems are introducing voluntary incident reporting systems aimed
specifically at acquiring more information on the human factor aspects of occurrences.
Voluntary incident reporting systems
Annex 13 recommends that States introduce voluntary incident reporting systems to supplement the
information obtained from mandatory reporting systems. In such systems, the reporter, without any legal
or administrative requirement to do so, submits a voluntary incident report. In a voluntary reporting
system, regulatory agencies may offer an incentive to report. For example, enforcement action may be
waived for unintentional violations that are reported. The reported information should not be used against
the reporters, i.e. such systems must be non-punitive to encourage the reporting of such information.
21
See Annex 13, Ch 8
7-4
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
Confidential reporting systems
Confidential reporting systems aim to protect the identity of the reporter. This is one way of ensuring that
voluntary reporting systems are non-punitive. Confidentiality is usually achieved by de-identification,
often by not recording any identifying information of the occurrence. One such system returns to the user
the identifying part of the reporting form and no record is kept of these details. Confidential incident
reporting systems facilitate the disclosure of human errors, enabling others to learn from mistakes made,
without fear of retribution or embarrassment.
PRINCIPLES FOR EFFECTIVE INCIDENT REPORTING SYSTEMS
People are understandably reluctant to report their mistakes to the company that employs them, or to the
government department that regulates them. Too often following an occurrence, investigators learn that
many people were aware of the unsafe conditions before the event. For whatever reasons, however, they
did not report the perceived hazards, perhaps because of:
a) Embarrassment in front of their peers;
b) Self-incrimination, especially if they were responsible for creating the unsafe condition;
c) Retaliation from their employer for having spoken out; or
d) Sanction (such as enforcement action) by the regulatory authority.
Use of the following principles help overcome the natural resistance to safety reporting.
Trust
Persons reporting incidents must trust that the receiving organization (whether the State or company) will
not use the information against them in any way. Without such confidence, people will be reluctant to
report their mistakes and they may also be reluctant to report other hazards they are aware of.
Trust begins with the design and implementation of the reporting system. Employee input into the
development of a reporting system is vital. A positive safety culture in the organization generates the kind
of trust necessary for a successful incident reporting system. Specifically, the culture must be error
tolerant and non-punitive. In addition, incident reporting systems need to be perceived as being fair in
how they treat unintentional errors or mistakes. (Most people do not expect an incident reporting system
to exempt criminal acts, or deliberate violations, from prosecution or disciplinary action.) Some States
consider such a process to be an example of a “Just Culture”.
Non-punitive
Non-punitive reporting systems are based on confidentiality. Before employees will freely report
incidents, they must receive a commitment from the regulatory authority or from top management that
reported information would not be used punitively against them. The person reporting the incident (or
unsafe condition) must be confident that anything said will be kept in confidence. In some States, “Access
7-5
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
to Information” laws make it increasingly difficult to guarantee confidentiality. Where this happens,
reported information will tend to be reduced to the minimum to meet mandatory reporting requirements.
Sometimes reference is made to anonymous reporting systems. Reporting anonymously is not the same as
confidential reporting. Most successful reporting systems have some type of callback capability in order
to confirm details, or obtain a better understanding of the occurrence. Reporting anonymously makes it
impossible to ensure understanding and completeness of the information provided by the reporter. There
is also a danger that anonymous reporting may be used for purposes other than safety.
Inclusive reporting base
Early voluntary incident reporting systems were targeted at flight crews. Pilots are in a position to observe
a broad spectrum of the aviation system, and are therefore well situated to comment on the system’s
health. Nonetheless, incident reporting systems which focus solely on the flight crew’s perspective, tend
to reinforce the idea that everything comes down to pilot error. Taking a systemic approach to safety
management requires that safety information be obtained from all parts of the operation.
In State-run incident reporting systems, collecting information on the same occurrence from different
perspectives facilitates forming a more complete impression of events. For example, ATC instructs an
aircraft to ‘go around’ because there is a maintenance vehicle on the runway without authorization.
Undoubtedly, the pilot, the controller and the vehicle operator would all have seen the situation from
different perspectives. Relying on one perspective only may not provide a complete understanding of the
event.
Independence
Ideally, State-run voluntary incident reporting systems are operated by an organization separate from the
aviation administration responsible for the enforcement of aviation regulations. Experience in several
States has shown that voluntary reporting benefits from a trusted “third party” managing the system. The
“third party” receives, processes and analyses the incident reports and feeds the results back to the
aviation administration and the aviation community. With “mandatory” reporting systems, it may not be
possible to employ a “third party”. Nevertheless, it is desirable that the aviation administration gives a
clear undertaking that any information received will be used for accident prevention purposes only. The
same principle applies to an airline or any other aviation operator that uses incident reporting as part of its
safety management system.
Ease of reporting
The task of submitting incident reports should be as easy as possible for the reporter. Reporting forms
should be readily available so that anyone wishing to file a report can do so easily. They should be simple
to compile, with adequate space for a descriptive narrative and they should encourage suggestions on how
to improve the situation or prevent a reoccurrence. To simplify completion, classifying information, such
as the type of operation, light conditions, type of flight plan, weather, etc. can use a “tick-off” format.
7-6
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
Acknowledgment
The reporting of incidents requires time and effort by the reporter and should be appropriately
acknowledged. To encourage further reports, one State includes a blank report form with the
acknowledgment letter. In addition, the reporter naturally expects feedback about actions taken in
response to the reported safety concern.
Promotion
The (de-identified) information received from an incident reporting system should be made available to
the aviation community in a timely manner. This may also help to motivate people to report further
incidents. Such promotion activities may take the form of monthly newsletters or periodic summaries.
Ideally a variety of methods would be used with a view to achieving maximum exposure.
INTERNATIONAL INCIDENT REPORTING SYSTEMS
ICAO Accident/Incident Data Reporting System (ADREP)
In accordance with Annex 13, States report to ICAO information on all aircraft accidents, which involve
aircraft of a maximum certified take-off mass of over 2,250 kg. ICAO also gathers information on aircraft
incidents (involving aircraft over 5,700 kg.) considered to be important for safety and accident
prevention. This reporting system is known as ADREP. States report specific data in a pre-determined
(and coded) format to ICAO. When ADREP reports are received from States, the information is checked
and electronically stored, constituting a databank of worldwide occurrences.
ICAO does not require States to investigate incidents. However, if a State does investigate a serious
incident, they are requested to submit formatted data to ICAO. The types of serious incidents of interest to
ICAO include:
a) Multiple system failures;
b) Fires or smoke on-board an aircraft;
c) Terrain and obstacle clearance incidents;
d) Flight control and stability problems;
e) Take-off and landing incidents;
f) Flight crew incapacitation;
g) Decompression; and
h) Near collisions and other serious air traffic incidents.
7-7
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
European Co-ordination Centre for Aviation Incident Reporting Systems (ECCAIRS) 22
Many aviation authorities in Europe have collected information about aviation accidents and incidents.
However, the number of significant occurrences in individual States was usually not sufficient to give an
early indication of potentially serious hazards or to identify meaningful trends. Since many States had
incompatible data storage formats, pooling of safety information was almost impossible. To improve this
situation, the European Union (EU) introduced occurrence-reporting requirements and developed the
ECCAIRS safety database system. The objective of these moves was to improve aviation safety in Europe
through the early detection of potentially hazardous situations. The ECCAIRS system includes
capabilities for analysing and presenting the information in a variety of formats. The database is
compatible with some other incident reporting systems, such as ADREP. Some non-European States have
also chosen to implement the ECCAIRS system to take advantage of common classification taxonomies,
etc.
STATE VOLUNTARY INCIDENT REPORTING SYSTEMS
A number of States operate successful voluntary incident reporting systems that utilize common features.
Two such systems are described below:
Aviation Safety Reporting System (ASRS)23
The United States operates a large aviation occurrence reporting system, known as the Aviation Safety
Reporting System (ASRS). The ASRS operates independently from the Federal Aviation Administration
(FAA) and is administered by NASA. Pilots, air traffic controllers, cabin crew, mechanics, ground
personnel, and others involved in aviation operations may submit reports when aviation safety has been
considered to be compromised. Samples of reporting forms are at the ASRS Website.
Reports sent to the ASRS are held in strict confidence. All reports are de-identified before being entered
into the database. All personal and organizational names are removed. Dates, times and related
information, which might reveal an identity, are either generalized or eliminated. ASRS data are used to:
a) Identify systemic hazards in the national aviation system for remedial action by appropriate
authorities;
b) Support policy formulation and planning in the national aviation system;
c) Support research and studies in aviation, especially including human factors safety research; and
d) Providing information to promote accident prevention.
The FAA recognizes the importance of voluntary incident reporting to accident prevention and offers
ASRS reporters some immunity from enforcement actions, waiving penalties for unintentional violations
reported to ASRS. With over 300,000 reports now on file, this database supports research in the aviation
safety — especially relating to human factors.
22
23
For more information on ECCAIRS visit their website at http://eccairs-www.jrc.it
The ASRS website is at http://asrs.arc.nasa.gov
7-8
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
Confidential Human Factors Incident Reporting Programme (CHIRP)24
CHIRP contributes to the enhancement of flight safety in the United Kingdom, by providing a
confidential reporting system for all individuals employed in aviation. It complements the United
Kingdom’s Mandatory Occurrence Reporting system. Noteworthy features of CHIRP include:
a) Independence from the regulatory authority;
b) Broad availability (including flight crew members, air traffic control officers, licensed aircraft
maintenance engineers, cabin crew and the general aviation community);
c) Confidentiality of reporters’ identities;
d) Analysis by experienced safety officers;
e) Newsletters with broad distribution to improve safety standards by sharing safety information;
and
f) Participation by CHIRP representatives on several aviation safety bodies to assist in resolving
systemic safety issues.
COMPANY REPORTING SYSTEMS25
In addition to State-operated incident reporting systems (both mandatory and voluntary), many airlines,
ATS providers and airport operators are implementing ‘in-house’ reporting systems for the reporting of
safety hazards and incidents. If reporting is available to all personnel (not just flight crews), company
reporting systems help promote a positive company-wide safety culture. Chapter 16 (Aircraft Operations)
includes a deeper examination of company hazard and incident reporting systems.
IMPLEMENTATION OF INCIDENT REPORTING SYSTEMS
If implemented in a non-punitive work environment, an incident reporting system can go a long way
toward creating a positive safety culture. Depending on the size of the organization, the most expedient
method for incident and hazard reporting is to use existing “paperwork” such as safety reports and
maintenance reports. However, as the volume of reports increases, some sort of computerized system will
be required to handle the task.
What to report?
Any hazard which has the potential to cause damage or injury or which threatens the organization’s
viability should be reported. Hazards and incidents should be reported if it is believed that:
a) Something can be done to reduce the accident potential;
b) Other aviation personnel could learn from the report; or
24
25
Visit the CHIRP website at http://www.chirp.co.uk/air
FSF Digest Aviation Safety: US Efforts to Implement Flight Operational Quality Assurance Programs, Jul-Sep 1998 p51
7-9
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
c) The system and its inherent defences did not work “as advertised.”
In short, if in doubt as to an event’s safety significance, report it. (Those incidents and accidents that are
required to be reported in accordance with State laws or regulations governing accident or incident
reporting should also be included in an operator’s reporting database.) Appendix 2 to Chapter 17 (Aircraft
Operations) provides examples of the types of events that should be reported to an operator’s incident
reporting system.
Who should report?
To be effective, incident reporting systems should include a broad reporting base. For specific
occurrences, the perceptions of different participants in, or witnesses to, an event may be quite different –
yet all relevant. Thus, State systems for voluntary incident reporting should encourage the participation of
flight and cabin crews, air traffic controllers, airport workers and maintenance technicians.
System management
In establishing a reporting system, management and senior operations personnel need to determine the
operating procedures for the system. For example:
a) What types of occurrence, event, or hazard should be reported;
b) Where do all the reports go initially;
c) How will the reporting be acknowledged;
d) Who is responsible for any investigation that may be required;
e) How will confidentiality be protected;
f) What degree of immunity will be provided and the criteria that will apply for such immunity;
g) What is the role of the Safety Manager and the safety committee;
h) What are the criteria for bringing a (de-identified) report to management’s attention; and
i)
How will any safety lessons learned and actions taken be disseminated to relevant staff.
Such procedures should be widely disseminated to encourage use of the reporting system.
Reporting method and format
The method and format chosen for a reporting system matters little as long as it encourages personnel to
report all hazards or incidents. The reporting process should be as simple as possible, and be well
documented, including details as to what, where and when to report.
7-10
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
In designing reporting forms, the layout should facilitate the submission of information. Sufficient space
should be provided to encourage reporters to identify suggested corrective actions. Other factors to be
considered in designing a system and reporting forms include:
a) Operational personnel are generally not prolific writers, therefore the form should be kept as short
as possible;
b) Reporters are not safety analysts, therefore, the questions should be in simple, everyday language;
c) Use of non-directive questions instead of leading questions. (Non-directive questions include:
What happened? Why? How was it fixed? What should be done?);
d) Prompts may be required for the reporter to think about ‘system failures’ (how close were they to
an accident?), and to consider their error management strategies;
e) Focus should be on the detection and recovery from an unsafe situation or condition; and
f) Reporters should be encouraged to consider the wider safety lessons inherent in the report, e.g.
how the organization and the aviation system could benefit from it.
Regardless of the source or method of submission, once the information is received it must be stored in a
manner suitable for easy retrieval and analysis.
Limitations on the use of incident data
Data gathered through voluntary hazard and incident reporting systems may be subject to a number of
limitations. In analysing such data, care must be taken to avoid erroneous conclusions and inappropriate
safety actions. Some of the factors to be considered in using incident data include:
a) Information may not have been verified;
b) Reporters are naturally biased (for example, pilots may see the circumstances of an occurrence
differently to an air traffic controller because of their different perspectives);
c) Reporting forms may inadvertently reflect bias (such as eliciting irrelevant information and
avoiding more crucial information);
d) Incident-reporting databases may be biased (by forcing the artificial classification of some data);
and
e) Shortcomings in trend analyses due to the nature and structure of the recorded information.
Appendix 1 to this chapter contains further guidance on limitations in the use of data from voluntary
incident reporting systems.
————————
7-11
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
Appendix 1 to Chapter 7
LIMITATIONS ON THE USE OF DATA
FROM
VOLUNTARY INCIDENT REPORTING SYSTEMS26 27
Care needs to be taken when using data from voluntary submitted incident reports. In drawing
conclusions based on such data, analysts should be aware of the following limitations.
Information not validated. In some States, voluntary, confidential reports can be fully investigated
and information from other sources brought to bear on the incident. However, the confidentiality
provisions of smaller programmes (such as company reporting systems) make it difficult to
adequately follow-up on the report without compromising the identity of the reporter. Thus, much of
the reported information cannot be substantiated.
Reporters may have a tendency to understate their errors and blame the occurrence on other parties.
Incidents may also be embellished to benefit the reporters. For example, during contract negotiations
the number of incidents reported may be artificially inflated in order to support one side’s bargaining
position.
Reporter biases. Two factors may bias voluntary incident data: who reports and what gets reported.
For example, pilots expect air traffic controllers to prevent traffic conflicts at tower-equipped airports.
When a conflict occurs, pilots may view it as a problem with the air traffic system and therefore feel
it is useful to report the incident. At non-tower airports, pilots are responsible for seeing and avoiding
the other aircraft. If they fail to do so, the same pilots may feel culpable and perceive little benefit in
reporting the incident; particularly, if no one else is in a position or likely to follow-up on the error.
Some of the factors contributing to the subjective nature of voluntary incident reports include:
a) Reporters must be familiar with the reporting system, they must have access to reporting
forms or phone numbers;
b) Reporters’ motivation to report may vary due to the following factors:
1) Level of commitment to safety;
2) Awareness of the reporting system;
3) Perception of the associated risks (local vs. systemic implications);
4) Operational conditions (some types of incident receive more attention than others);
5) Denial, ignorance of safety implications, desire to hide the problem, or fear of
recrimination or even disciplinary action, (despite guarantees to the contrary);
Development of a Methodology for Operational Reporting and Analysis Systems (OIRAS), Appel d’offres DGAC No 96/01
by Jean Paries and Ashleigh Merritt
26
Adapted in part from a paper by Dr. Sheryl Chappell of NASA ASRS entitled “Using Voluntary Incident Reports for Human
Factors Evaluations” Aviation Psychology in Practice, Avebury Press, 1994
27
7-12
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
c) Different occupational groups see things differently, both in terms of interpreting the same
event and in terms of deciding what is important; and
d) Reporters must be aware of an incident to submit a report. Errors that go undetected are not
reported.
Report forms. Typically, incident reporting forms induce bias (including bias against reporting at all):
a) A report form must be sufficiently short and easy to use that operational personnel are
encouraged to use it, thus the number of questions must be necessarily limited;
b) Completely open questions (i.e. narratives only) can fail to elicit useful data;
c) Questions can guide the reporter, but they can also distort perceptions by leading the reporter
to biased conclusions; and
d) The range of possible events is so broad that a standard structured form cannot capture all
information. (Therefore, analysts may have to contact the reporter to gain specific
information.)
Incident reporting databases. Information must be categorized in accordance with a pre-determined
structure of keywords or definitions for entry into the database for later retrieval. Typically this
introduces bias into the databases, compromising their utility. For example:
a) Unlike objective physical flight parameters, descriptions of events and any causal attributions
are more subjective;
b) Categorization requires a system of pre-determined keywords or definitions biasing the
database:
1) Reports are analysed to “fit” the keywords. Details that don’t fit are ignored;
2) Impossibility of creating an exhaustive list of keywords for classifying information;
3) Keywords are either present or not present providing a poor approximation of the real
world;
4) Information is retrieved according to how it is stored; hence categorization
determines the output parameters. For example, if there is no keyword called
‘technical failure’ then ‘technical failure’ will never be found to be the cause of
incidents from that database;
5) The categorization system creates a ‘self-fulfilling prophecy’. For example, many
incident reporting systems bias the keyword categorization toward CRM.
Consequently, CRM is often cited as both the cause of the problem and its cure (more
CRM training will redress the perceived CRM deficiency);
c) Much of the information in the databases is never retrieved once it is entered; and
d) Given the generality of keywords, the analyst must frequently go back to the original report
7-13
Part 2
DRAFT
Chapter 7 - Hazard and Incident Reporting
to understand contextual details.
Relative frequency of occurrence. Since voluntary incident reporting systems do not receive
information of the type needed to compute useful rate figures, any attempt to put the incident in the
perspective of a frequency of occurrence vis à vis other occurrences will be an educated guess at best.
For valid frequency comparisons, three types of data are required: the number of persons actually
experiencing similar incidents (not just the reported incidents), the size of the population at risk of
similar occurrences, and a measurement of the time period under consideration.
Trend analysis. Meaningful trend analysis of the more subjective parameters recorded in incident
reporting databases have not been particularly successful. Following are some of the reasons:
a) Difficulties in using structured information;
b) Limitations in capturing the context for the incident through keywords;
c) Inadequate levels of detail and accuracy of recorded data;
d) Poor inter-reliability of one report against another;
e) Difficulties in merging data from different databases; and
f) Difficulties in formulating meaningful queries for the database.
____________________
7-14
Part 2
Chapter 8 – Safety Investigations
DRAFT
Chapter 8
SAFETY INVESTIGATIONS
Outline
Introduction


State investigations
— Accidents
— Serious incidents
In-house investigations
Scope of Safety Investigations

How much information is enough?
Information Sources


Primary sources of information
Secondary sources of information
Interviews


Conducting interviews
Caveat regarding witness interviews
Investigation Methodology
Investigating Human Performance Factors
Investigating Procedural Deviations
Safety Recommendations
Appendices
1. Interviewing Techniques
2. Line of Reasoning for Investigating Human Errors
3. Investigating Procedural Deviations with PEAT and MEDA
Part 2
Chapter 8 – Safety Investigations
DRAFT
This page intentionally left blank.
8-2
Part 2
Chapter 8 – Safety Investigations
DRAFT
Chapter 8
SAFETY INVESTIGATIONS28
Investigation. A process conducted for the purpose of accident prevention
which includes the gathering and analysis of information, the drawing of
conclusions, including the determination of causes and, when appropriate, the
making of safety recommendations.
ICAO Annex 13
INTRODUCTION
Effective safety management systems depend on the investigation and analysis of safety issues. The
accident prevention value of an accident, or a hazard, or incident report is proportional to the quality of
the investigative effort.
In the ICAO Annex 13 context, investigations are associated with accidents and serious incidents. For
reportable accidents and serious incidents, the State will normally provide an investigative team.
However, the State investigators will often call upon the special knowledge of the Safety Manager (SM)
and other staff for input to the investigation29.
Beyond the State investigations, the SM will likely lead investigations into less serious occurrences – but
which may have even deeper safety significance than the State investigations. Ideally each reported
hazard or occurrence is investigated to the point where all associated risks are identified.
The SM and those assigned as investigators must be technically competent both in the area under
investigation, as well as in the methods of safety investigation. The SM must earn the confidence and
respect of staff, and demonstrate the highest levels of integrity while objectively attempting to determine
why things occurred and how potential hazards might impact on safety. Consistent with the principles of
safety management, a systems approach is required.
State investigations
Accidents
Accidents provide compelling and incontrovertible evidence of the severity of hazards. Too often it
takes the catastrophic and grossly expensive nature of accidents to provide the spur for allocating
resources to reduce or eliminate unsafe conditions to an extent otherwise unlikely.
By definition, accidents result in damage and/or injury. If we concentrate on investigating only the
results of accidents, not the hazards or risks that cause them, we are being reactive. Reactive
28
Based on TC TP 13881 Safety Management Systems for Flight Operations and Aircraft Maintenance.
Manual of Aircraft Accident Investigation (Doc 6920) contains much useful information for SMs during formal State
investigations.
29
8-3
Part 2
Chapter 8 – Safety Investigations
DRAFT
investigations are rather inefficient from an accident prevention perspective in that serious latent
unsafe conditions posing significant safety risks may be overlooked.
The focus of an accident investigation should therefore be directed towards effective risk control.
With the investigation directed away from “the chase for the guilty party” and towards effective risk
mitigation, cooperation will be fostered among those involved in the accident, facilitating the
discovery of the underlying causes. The short-term expediency of finding someone to blame is
detrimental to the long-term goal of preventing accidents.
Serious incidents
The term “serious incident” is used for those incidents which good fortune narrowly prevented from
becoming an accident. For example, a near collision with another aircraft, or with the ground,
involving large passenger aircraft. Because of the seriousness of such incidents, they should be
thoroughly investigated. Some States treat these serious incidents as if they had been accidents. Thus
they use an accident investigation team to carry out the investigation, including the publication of a
Final Report and the forwarding to ICAO of an ADREP incident data report. This type of full-scale
incident investigation has the advantage of providing hazard information to the same standard as that
of an accident investigation, without the associated loss of life, aircraft or property.
In-house investigations
Most occurrences do not warrant investigations by either the State investigative or regulatory authorities.
Many incidents are not even required to be reported to the State. Nevertheless such incidents may be
indicative of potentially serious hazards – perhaps systemic problems that will not be revealed unless the
occurrence is properly investigated.
For every accident or serious incident there may be hundreds of minor occurrences, many of which have
the potential to become an accident. It is important that all reported hazards and incidents be reviewed and
a decision taken on which ones should be investigated and how deeply. SMs will frequently be involved
in investigating these occurrences with the view to accident prevention.
For in-house investigations, the SM (or investigating team) may require specialist assistance, depending
on the nature of the occurrence being investigated. For example:
a) Cabin safety specialists for in-flight turbulence encounters, smoke or fumes in the cabin, galley
fire, etc.;
b) Air Traffic Services for loss of separation, near collisions, frequency congestion, etc.;
c) Maintenance engineers for incidents involving material or system failures, smoke or fire, etc.; and
d) Airport management advice for incidents involving FOD, snow and ice control, airfield
maintenance, vehicle operations, etc.
8-4
Part 2
Chapter 8 – Safety Investigations
DRAFT
SCOPE OF SAFETY INVESTIGATIONS
How far should an investigation go into minor incidents and hazard reports? The extent of the
investigation should depend on the actual or potential consequences of the occurrence or hazard. Hazard
or incident reports that indicate high-risk potential should be investigated in greater depth than those with
low potential.
The depth and detail of the investigation should be that which is required to clearly identify and validate
the underlying hazards. Understanding why something happened requires a broad appreciation of the
context for the occurrence. To develop this understanding of the unsafe conditions, the investigator should
take a systems approach, perhaps drawing on the SHELL model outlined in Chapter 4.
In deciding upon the scope of work required, SMs must accept that resources are limited. Thus the effort
expended should be proportional to the perceived benefit, in terms of potential for identifying systemic
hazards and risks to the organization.
How much information is enough?
The investigative process should be comprehensive, attempting to address all the factors that contributed
to the situation. Active failures (sometimes called triggering events) take place immediately prior to the
event. They have a direct impact on system safety because of the immediacy of their effects. However,
they are not usually the root cause of the event. As such, applying corrective actions to these active
failures may not address the real cause(s) of the problem. A more detailed analysis is normally required to
understand all the factors that may have contributed.
How much data should be collected and analysed in order to develop an accurate picture of the
contributing factors? How many employees, support personnel, supervisors, etc. should be interviewed?
How far back in time should activities be investigated? To what extent should inter-personal relationships
be examined? At what point does past behaviour cease to influence current behaviour? To what level of
management should the investigation progress?
There are no clear answers to such questions. If the information will not help explain why something
happened (or happens), then it is not relevant to the investigation. The questioning process can be carried
to extremes. In practice, it is reasonable to stop the investigation at a point where corporate and regulatory
authorities no longer have any power to change the underlying situation in the interests of safety.
Although the investigation should focus on the factors most likely to have influenced actions, the dividing
line between relevance and irrelevance is often blurred. Data that initially may seem to be unrelated to the
investigation could later prove to be extremely relevant after relationships between particular elements of
the occurrence are better understood.
INFORMATION SOURCES
Information relevant to a safety investigation can be acquired from a variety of sources which can be
considered as primary and secondary sources of information. Primary sources of information include
similarly configured equipment, documentation, recorder tapes, interviews, direct observation of
personnel activities, and simulations. Secondary sources include occurrence databases, technical literature
and the expertise of professionals or specialists.
8-5
Part 2
Chapter 8 – Safety Investigations
DRAFT
Primary sources of information
Physical examination of the equipment used during the safety event may yield useful information. This
may include examining the front-line equipment used, its components, or the workstations and equipment
used by supporting personnel (e.g. air traffic controllers, maintenance and servicing personnel).
Documentation spanning a broad spectrum of the operation is available, for example:
a) Maintenance records and logs;
b) Personal records/logbooks;
c) Certificates, licenses;
d) In-house personnel and training records, work schedules, etc.;
e) Operator’s manuals and standard operating procedures;
f) Training manuals and syllabi;
g) Manufacturer’s data and manuals;
h) Regulatory authority records;
i)
Weather forecasts, records and briefing material; and
j)
Flight planning documents, etc.
Recordings (flight recorders, ATC radar and voice tapes, etc.) may provide useful information for
determining the sequence of events. In addition to traditional flight data recordings, maintenance
recorders in new generation aircraft are a potential additional source of information.
Interviews conducted with individuals directly or indirectly involved in the safety event can provide a
principal source of information for any investigation. In the absence of measurable data, interviews may
be the only source of information. (Thus, SMs involved in investigations need to be skilled in
interviewing techniques.)
Direct observation of actions performed by operating or maintenance personnel in their work
environment can reveal information about potential unsafe conditions. However, the persons being
observed must be aware of the purpose of the observations.
Simulations permit reconstruction of an occurrence and can facilitate a better understanding of the
sequence of events that led up to the occurrence, and the manner in which personnel responded to the
event. Computer simulations can be used to reconstruct events using data from on-board recorders, air
traffic control tapes, radar recordings and other physical evidence.
8-6
Part 2
Chapter 8 – Safety Investigations
DRAFT
Secondary sources of information
Additional information collected from other sources can facilitate the analysis of factual information.
The SM and in-house investigators cannot be experts in every field related to the operational
environment. It is important that they realize their limitations. When necessary, they must be willing to
consult with other professionals during an investigation. Depending on the nature of the hazard and any
required analysis, advice from such disciplines as engineering, medicine, psychology and statistics may
be required.
Some of the most useful sources of supporting information come from accident/incident databases, inhouse hazard and incident reporting systems, confidential reporting programmes, systems for monitoring
line operations (e.g. flight data analysis and LOSA programmes), manufacturers’ databases, etc.
A word of caution about databases: before using any data, the prudent SM will assess the limitations
inherent in the database. Consideration should be given to the source of the data, its intended purpose, the
definitions used in recording the data, its currency, etc. Chapter 15 (Practical Considerations) provides
guidance on the management of safety information including limitations on the use of databases.
INTERVIEWS
Information acquired through interviews can help clarify the context for unsafe acts and conditions. It can
be used to confirm, clarify or supplement information learned from other sources. Interviews can help to
determine what happened. More importantly, interviews are often the only way to answer the important
“why” questions which, in turn, can facilitate appropriate and effective safety recommendations.
In preparation for an interview, the interviewer must expect that individuals will perceive and recall
things differently. The details of a system defect reported by operational personnel may differ from those
observed by maintenance personnel during a service check. Supervisors and management may perceive
issues differently than line personnel. The interviewer must accept all views as worthy of further
exploration. However, even qualified, experienced and well-intentioned witnesses could be mistaken in
their recollection of events they have witnessed. In fact, in interviewing a number of persons on the same
event, if different perspectives are not offered, it may be grounds to suspect the validity of the information
being received.
Conducting interviews
The effective interviewer adapts to these differing views, remaining objective and avoiding making an
early evaluation of the content of the interview. An interview is a dynamic situation, and the skilled
interviewer knows when to continue a line of questioning and when to back off.
For best results, interviewers will likely employ a process as follows:
a) Careful preparation and planning for the interview;
b) Conducting the interview in accordance with a logical, well-planned structure; and
c) Assessing the information gathered in the context of all other known information.
8-7
Part 2
Chapter 8 – Safety Investigations
DRAFT
Appendix 1 to this chapter provides further guidance for conducting effective interviews.
Caveat regarding witness interviews
Sorting through the often-conflicting nature of witness interviews requires caution. Intuitively, an
interviewer may weigh the value of an interview dependent on the background and experience of the
person being interviewed. However:
a) Persons judged as “good witnesses” may allow their perceptions to be influenced by their
experience (i.e. they see and hear what they would “expect”). Consequently, their description of
events may be biased; and
b) On the other hand, people who have no knowledge of an occurrence they have witnessed are
often able to accurately describe the sequence of events. They may be more objective in their
observations.
The skilled investigator does not overly rely on a single witness – even the testimony of an expert. Rather,
information from as many sources as practical needs to be integrated to form an accurate perception of the
situation.
Credibility is difficult to establish, easy to lose,
and almost impossible to regain.
INVESTIGATION METHODOLOGY
Accident prevention requires a range of activities related to the investigative process. The traditional field
phase of the investigation is used to identify and validate perceived safety hazards. Competent safety
analysis is required to assess the risks, and effective communications are required to control the risks. In
other words, effective safety management requires an integrated approach to safety investigations.
Some occurrences and hazards originate from material failures, or occur in unique environmental
conditions. However, the majority of unsafe conditions are generated through human errors. When
considering human error, an understanding of the unsafe conditions that may have affected human
performance or decision-making is required. These unsafe conditions may be indicative of systemic
hazards that put the entire aviation system at risk. Consistent with the systems approach to safety, an
integrated approach to safety investigations considers all aspects that may have contributed to unsafe
behaviour or created unsafe conditions.
The logic flow for an integrated process for safety investigations is depicted in Figure 8 - 1 Integrated
Safety Investigation Methodology (ISIM). Such a model can guide the SM or safety investigator from
initial hazard or incident notification, through to the communication of safety lessons learned.
8-8
Part 2
Chapter 8 – Safety Investigations
DRAFT
Hazard or Occurrence Notification &
Assessment
Assess notification and
decide to investigate or not.
Data Collection Process
Identify events and underlying factors
Reconstruct logical progression
of occurrence events.
Sequence of Events
Analyze facts & determine
findings re underlying
factors and hazards
Integrated Investigation
Estimate risk and
determine acceptability
for each hazard
Risk Assessment Process
Identify defences that
are missing or inadequate
Identify and evaluate
risk control options
Defence Analysis
Risk Control Analysis
Communicate safety
message to stakeholders
Safety Communication Process
Figure 8 - 1. Integrated safety investigation model (ISIM)30
Effective investigations do not follow a simple step-by-step process that starts at the beginning and
proceeds directly through each phase to completion. Rather, it is an iterative process that may require
going back and repeating steps as new data is acquired and/or as conclusions are reached. The SM and the
30
Adapted from ISIM developed by TSB of Canada
8-9
Part 2
Chapter 8 – Safety Investigations
DRAFT
investigative team must remain flexible, moving forwards and backwards from data collection through
analysis and back again.
INVESTIGATING HUMAN PERFORMANCE FACTORS
Investigators have been quite successful in analysing the measurable data as it pertains to human
performance, e.g. strength requirements to move a control column, lighting requirements to read a
display, ambient temperature and pressure requirements, etc. Unfortunately, the majority of safety
deficiencies derive from issues that do not lend themselves to simple measurement and are thus not
entirely predictable. As a result, the information available does not always allow an investigator to draw
indisputable conclusions.
Several factors typically reduce the effectiveness of a human performance analysis.31 These include:
a) The lack of normative human performance data to use as a reference against which to judge
observed individual behaviour. (FDA and LOSA data are now providing a baseline for better
understanding normal day-to-day performance in flight operations.)
b) The lack of a practical methodology for generalizing from the experiences of an individual (crew
or team member) to an understanding of the probable effects on a large population performing
similar duties.
c) The lack of a common basis for interpreting human performance data among the many disciplines
(e.g. engineering, operations and management) that make up the aviation community.
d) The ease with which humans can adapt to different situations, further complicating the
determination of what constitutes a breakdown in human performance.
The logic necessary to convincingly analyse some of the less tangible human performance phenomena is
different from that required for other aspects of an investigation. Deductive methods are relatively easy to
present and lead to convincing conclusions. For example, a measured wind shear produced a calculated
aircraft performance loss and a conclusion could be reached that the wind shear exceeded the aircraft’s
performance capability. Such straight cause/effect relationships cannot be so easily established with some
human performance issues such as complacency, fatigue, distraction or judgement.
For example, if an investigation revealed that a crewmember made an error leading to an occurrence
under particular conditions (such as fatigue, distractions, or complacency), it does not necessarily follow
that the error was made because of these preconditions. There will inevitably be some degree of
speculation involved in such a conclusion. The viability of such speculative conclusions is only as good
as the reasoning process used and the weight of evidence available.
Inductive reasoning involves probabilities. Inferences can be drawn on the most probable or most likely
explanations of behavioural events. Inductive conclusions can always be challenged and their credibility
depends on the weight of evidence supporting them. Accordingly, they must be based upon a consistent
and accepted reasoning method.
Other factors to be considered when investigating human performance issues are:
31
A manager from the Boeing Commercial Aeroplane Company (Fadden), 1984
8-10
Part 2
Chapter 8 – Safety Investigations
DRAFT
a) How to assess the relevance of certain behaviour or actions deemed "abnormal" or "nonstandard";
b) Sensitivity and privacy considerations vs. accident cause/prevention significance; and
c) Avoidance of speculation vs. logical and reasonable explanations.
Analysis of the human factors must take into account the objective of the investigation (i.e. understanding
why something happened). Occurrences are seldom the result of a single cause. Although individual
factors when viewed in isolation may seem insignificant, in combination they can result in a sequence of
events and conditions that culminate in an accident. The SHEL model provides a systematic approach to
examining the constituent elements of the system as well as the interfaces between them. The interactive
system suggested by Reason also provides an excellent framework by which investigators can ensure that
the analysis addresses all levels of the production system.
Understanding the context in which humans err is fundamental to understanding the unsafe conditions
that may have affected their behaviour and decision-making. These unsafe conditions may be indicative
of systemic risks posing significant accident potential.
The Error-Type Decision Tree provides a useful guide for understanding human errors.
8-11
Part 2
Chapter 8 – Safety Investigations
DRAFT
Was the error caused by
a miscommunication,
misinterpretation, or failure to
communicate pertinent
information?
YES
Communication
Error
NO
YES
Was the error caused by a lack of
knowledge or basic proficiency
with equipment or procedures?
Proficiency
Error
NO
YES
Was the error a decision that
increased risk and for which
there were no written
regulations or procedures?
Operational
Decision Error
NO
Was the error associated with an
intention not to follow written
regulations or procedures?
YES
Intentional
Non-compliance
Error
NO
Procedural
Error
Figure 8-2. Error-type decision tree
Typically, incidents and accidents involve several, often inter-linked errors. Each error should be
investigated to determine the most serious underlying contributory factors. With a better understanding of
the context for the human errors (or violations), the SM or investigator can determine the extent to which
a systemic hazard exists. The normal risk assessment process can then be used to determine whether the
risk is unacceptable, therefore, warranting safety action.
8-12
Part 2
Chapter 8 – Safety Investigations
DRAFT
Appendix 2 to this chapter provides a line of reasoning suitable for the investigation of human errors.
Research initiatives have determined that a majority of hull-loss accidents would have been prevented if
established procedures had been followed. Since incidents are often only separated from accidents by a
fine margin of luck, non-compliance with established procedures is also likely to be causal in most
incidents.
Simply listing procedural deviations as a contributing factor to an occurrence implies that further
exhortations to “be safe” will alter human performance. However, understanding the causal and
contributory context for such deviations from SOPs and accepted safe work practices is fundamental to
accident prevention.
The reasons behind non-compliance with procedures are not well understood. They may include
ambiguously written or poorly understood procedures, inadequate training, time pressures, design
shortcomings, incompatible work environments, unexpected operational situations, or just plain poor
judgement. Fortunately, following an in-flight incident the crew is available to share their experience,
insights and rationale.
Ironically, occurrences occasionally arise where deviations from accepted work practices do prevent an
accident. These may also warrant investigation to identify the weaknesses in the established procedures.
Boeing has developed two investigation methodologies aimed at preventing similar types of procedural
deviations: a Procedural Event Analysis Tool (PEAT) and a Maintenance Error Decision Aid (MEDA).
Both follow an iterative (integrated) approach.
PEAT focuses on the key event elements and the underlying cognitive factors that contributed to the
event. In contrast, MEDA looks at organizational factors that contribute to human error, such as poor
communication, inadequate information and poor lighting.
PEAT and MEDA recognize that traditional efforts to investigate errors are often aimed at identifying the
employee who made the error. The usual result is that the employee is subjected to a combination of
disciplinary action and recurrent training – with little systemic safety benefit. Therefore, both PEAT and
MEDA are based on the philosophy that personnel seldom deliberately deviate from prescribed
procedures, especially if doing so is a safety risk.
PEAT and MEDA are both software-based analytical tools, requiring specialized forms for following the
process in a systematic way. Consequently, specific training in the use of PEAT and MEDA is required to
ensure their successful application. Appendix 3 to this Chapter includes further information on PEAT and
MEDA.
SAFETY RECOMMENDATIONS
When an investigation identifies hazards or unmitigated risks, safety action is required. The need for
action must be communicated by means of safety recommendations to those with the authority to expend
the necessary resources. Failure to make appropriate safety recommendations may leave the risk
unattended. For those formulating safety recommendations, the following considerations may apply:
Action agency. Who can best take the necessary corrective action? Who has the necessary authority
and resources to intervene? Ideally, problems should be addressed at the lowest possible level of
authority, such as the departmental or company level as opposed to the national or regulatory level.
8-13
Part 2
Chapter 8 – Safety Investigations
DRAFT
However, if several organizations are exposed to the same unsafe conditions, extending the
recommended action may be warranted. State and international authorities, or multinational
manufacturers may best be able to initiate the necessary safety action.
What vs. how. Safety recommendations should clearly articulate what must be done, not how to do it.
The focus is on communicating the nature of the risks requiring control measures. Detailed safety
recommendations which spell out exactly how the problem should be fixed, should be avoided. The
responsible manager should be in a better position to judge the specifics of the most appropriate
action for the current operating conditions. The effectiveness of any recommendation will be
measured in terms of the extent to which the risks have been reduced, rather than strict adherence to
the wording in the recommendation.
General vs. specific wording. Since the purpose of the safety recommendation is to convince others
of an unsafe condition putting some or all of the system at risk, specific language should be used in
summarizing the scope and consequences of the identified risks. On the other hand, since the
recommendation should specify what is to be done (not how to do it), concise wording is preferable.
Recipient’s perspective. In recommending safety action, the following considerations pertain from the
recipient’s perspective:
a) The safety recommendation is addressed to the most appropriate action authority (i.e. they
have the jurisdiction and authority to effect the necessary change).
b) There are no surprises (i.e. there has been prior dialogue concerning the nature of the assessed
risks).
c) It articulates what must be done, while leaving the action authority with the latitude to
determine how best to meet that objective.
Failure to observe these basic considerations may compromise the risk from receiving the necessary
safety attention.
Formal safety recommendations warrant written communications. This ensures that everyone is “singing
from the same song sheet” and provides the necessary baseline for evaluating the effectiveness of
implementation. However, it is important to remember that safety recommendations are only effective if
the responsible managers implement them
— — — — — — — —-
8-14
Part 2
Chapter 8 – Safety Investigations
DRAFT
Appendix 1 to Chapter 8
INTERVIEWING TECHNIQUES32
GENERAL
Information from interviews can be used to confirm, clarify, or supplement information obtained from
other sources. Moreover, in the absence of measurable data, interviews may be the only source of
information.
The interviewer’s role is to obtain information from the interviewee that is as accurate, complete, and
detailed as possible. To accomplish this, the interviewer must:
a) Be prepared.
b) Have a clear objective.
c) Have a good knowledge of the hazard or incident under investigation.
d) Be able to adapt to the interviewee’s personal style.
e) Be willing to go beyond simple facts.
Interviews, particularly those involving human performance factors, must go beyond the “What and
When” of the occurrence; they must also attempt to find out “How and Why” it occurred.
To facilitate the interview, the interviewer must:
a) Keep the number of people participating in the interview to a minimum.
b) Include anyone (such as a technical expert, supervisor or union representative) that the
interviewee has requested to be present.
c) Brief others on their expected behaviour during the interview.
d) Not tolerate disruptions of the interview by others.
PREPARING FOR THE INTERVIEW
Personal preparations
The success of the interview will closely relate to personal preparation. Tailor the preparations to the
interview. For example, depending on the nature of the reported hazard or incident, review the following:
a) The facts relating to any safety event.
b) Recorded information, if any.
32
Adapted from Transportation Safety Board of Canada (TSB) Manual of Investigations
8-15
Part 2
Chapter 8 – Safety Investigations
DRAFT
c) The aircraft systems, if applicable.
d) Any operational peculiarities in procedures.
e) The crew’s personal records.
f) Communication, navigation, and approach facilities, as applicable; etc.
Other preparatory factors include:
a) If possible, visit the occurrence site.
b) Define the general objectives of the interview.
c) Prepare a set of appropriate questions to address areas of concern.
d) Assess the audience and dress accordingly.
e) Arrive for the interview with any required equipment or reference material.
f) Consider requesting expert assistance for interviews of a highly technical nature.
Timing
Timing is critical. In follow-up to an incident or safety event, interviews should be conducted as soon as
practicable to avoid:
a) Loss of perishable information from fading memory.
b) Interpretation and rationalization of events.
c) Contamination caused by exposure to others’ recollections or interpretations of events.
If an immediate interview is impractical, request a written statement to ensure information is recorded
while fresh in the interviewee’s mind.
Location
If possible, select a location which is:
a) Quiet and comfortable.
b) Free from interruptions.
c) For a witness, where he was when he witnessed the event, if at all possible.
8-16
Part 2
Chapter 8 – Safety Investigations
DRAFT
The interview
The opening of the interview should reassure the interviewee about:
a) The purpose of the interview (investigation).
b) Interviewee’s rights.
c) Your role as the interviewer.
d) The procedures to be followed.
Establish a rapport with the witness at the outset. Consider the following:
a) Be polite.
b) Behave in a natural manner; do not make the interview seem artificial.
c) Keep interruptions to a minimum.
d) Strive for an atmosphere of friendly conversation.
e) Intervene only enough to steer the conversation in the desired direction.
f) Display a sincere interest.
The success of the interview will depend on both the timing and the structure of the questions. Begin the
interview with a “free-recall” question, letting the individual talk about what he or she knows of the
occurrence or subject matter. A free-recall allows the witness:
a) To ease into the interview in a more relaxed manner.
b) To feel that what he or she has to say is significant.
c) More importantly, it is a source of information which is uncontaminated by the interviewer.
Sequence the questions from:
a) Easier to harder.
b) General to specific.
As the interview progresses, use a mixture of other types of questions:
a) Open-ended or “trailing-off” questions evoke rapid and accurate descriptions of the events, and
lead to more participation by the interviewee. (For example: “You said earlier that your training
was…?”)
b) Specific questions are necessary to obtain detailed information and may also prompt the person to
recollect further details.
8-17
Part 2
Chapter 8 – Safety Investigations
DRAFT
c) Closed questions produce “yes” or “no” answers (providing little insight beyond the response).
d) Indirect questions might be useful in delicate situations. (For example, “You mentioned that the
first officer was uneasy about flying that approach. Why?”)
A good closing to the interview includes:
a) Summary of the key points.
b) An opportunity for the interviewee to expand on any points previously covered, or to add further
points.
c) Reassurance to the interviewee and thanks for cooperation.
d) Determination of availability for further interviews (if required).
QUESTIONNING TECHNIQUES
Following are some of the common traps interviewers may encounter.
a) Avoid questions with the definite article unless the object in question has already been mentioned
by the witness.
1) Use: “Did you see A broken strut?”
2) Not: “Did you see THE broken strut?”
b) When asking a question, avoid leading questions, i.e. any question that contains the answer.
Instead, use neutral sentences without adjectives or figurative verbs.
1) Use: “Which way was the aircraft travelling?”
2) Not: “Was the aircraft travelling west?”
c) A question which mentions some object (whether the object existed or not) causes a tendency for
a witness to assert that he/she saw the object. Design your questions so that they do not mention
objects before the interviewee mentions them.
d) If you have to ask questions of a personal nature, it is even more important to ask indirect
questions. For example:
1) Use: “Was there anything upsetting you on the day of the occurrence?”
2) Not: “How did your marital situation affect you that day?”
ADDITIONAL GUIDANCE
Following are some additional hints for effective interviews:
8-18
Part 2
Chapter 8 – Safety Investigations
DRAFT
a) The interview is a dynamic process that requires continuing adaptation to the situation and to the
interviewee.
b) People approach any occurrence or situation from different perspectives.
c) Remain objective and avoid making evaluations early in the interview, concentrate on the
questions to be asked.
d) Be aware of possible biases when assessing what was said during the interview.
e) Do not allow the interviewee’s personality to influence interpretation of the interview.
f) Do not accept any information gained in an interview at face value. Use it to confirm, clarify, or
supplement information from other sources.
In some circumstances there may be many witnesses to be interviewed. The resultant (often conflicting)
information must be summarized, sorted and compiled in a useful format. For example:
a) Prepare a summary of each interview.
b) If appropriate, prepare a data-matrix.
c) Summarize each interview by consolidating information under meaningful headings.
d) Write an overall description from each set of summaries.
e) Do not draw conclusions from evidence which is too inconsistent to support them.
TEN COMMANDMENTS OF GOOD LISTENING
Good interviews require good listening skills.
1. Stop talking: You cannot listen if you are talking. “Give every person thine ear, but few thy
voice” (Polonius).
2. Put the talker at ease: Help witnesses feel that they are free to talk.
3. Show them you want to listen: Look and act interested. (For example: Do not review documents
while they talk.)
4. Remove distractions: Do not doodle, tap, or shuffle papers.
5. Empathize with them: Try to put yourself in their place so that you can see their points of view.
6. Be patient: Allow plenty of time. Do not interrupt.
7. Hold your temper: An angry person gets the wrong meaning from words.
8-19
Part 2
Chapter 8 – Safety Investigations
DRAFT
8. Do not criticize or embarrass: Criticism puts respondents on the defensive. Do not embarrass
witnesses by commenting on their lack of technical knowledge, education, choice of words. They
may “clam up” or get angry. Do not argue: even if you win, you lose.
9. Ask questions: Asking questions encourage the respondent and shows that you are listening. It
also helps to develop points further.
10. Stop talking: This is first and last because all the other commandments depend on it. You cannot
do a good listening job while you are talking.
RECORDING THE INTERVIEW
Each interview should be documented for future reference. Records may consist of transcripts, interview
summaries, notes and tape-recordings (if taped).
For major (State) investigations, complete transcripts of interview testimony may be required. A qualified
stenographer should be hired specifically to provide a verbatim record of the interview which must not be
edited.
For formal interviews, both the interviewer and the interviewee may agree that the interview be taped.
Recorded interviews may be summarized in whole or in part depending on the degree of detail required to
support investigative findings. Consider the following factors in summarizing a taped interview:
a) The meaning is not changed;
b) Subjective interpretations are not made;
c) The interviewee’s attitude is not obscured;
d) Significant testimony is not omitted; and
e) The nature of the summary is indicated in the record.
Those portions of the recorded interviews not relevant to occurrence may be summarized.
The majority of interviews will not be recorded. However, a summary of the interview should still be
prepared, again ensuring that:
a) Subjective interpretations are not made;
b) The interviewee’s attitude is not obscured; and
c) Significant testimony is not omitted.
Information relevant to the hazard or situation under investigation derived from informal interviews or
conversations may be summarized in concise notes.
————————
8-20
Part 2
Chapter 8 – Safety Investigations
DRAFT
Appendix 2 to Chapter 8
LINE OF REASONING
FOR
INVESTIGATING HUMAN ERRORS
Systematically investigating human errors requires an integrated line of inquiry. In attempting to identify
the underlying factors affecting human behaviour which may have led to an occurrence, the SM (or
investigator) might use the following reasoning:
a) Was the behaviour or decision intentional or unintentional?
b) If it was unintentional:
1) Was it a slip, whereby, the person knew what was required but inadvertently did
something else (e.g. selected the wrong frequency)?
2) Was it a lapse whereby the person again knew what was required, but forgot (e.g. missed
an altitude call-out)?
3) Was the unintentional behaviour the result of:
— Unfamiliarity with a new procedure or new work environment?
— Lack of skill or currency?
— Inattention (due to distraction, fatigue, pre-occupation, complacency, etc.)?
c) If it was intentional:
1) Was it a genuine mistake (perhaps due to an inappropriate plan)?
2) Was it a deliberate violation of prescribed operating procedures?
3) Were any mistakes in decision-making due to:
— Inadequate knowledge for the conditions?
— Ambiguous or inadequate rules governing the procedure?
— Biases in decision-making?
— Memory problems?
4) Were any deliberate violations due to:
— Misunderstanding of correct procedures?
8-21
Part 2
Chapter 8 – Safety Investigations
DRAFT
— Adaptations of SOPs (i.e. routine application of a non-standard, but accepted,
practice)?
— Corporate climate of indifference?
— Exceptional deviation due to actual conditions (e.g. weather, aircraft serviceability,
security issue)?
————————
8-22
Part 2
Chapter 8 – Safety Investigations
DRAFT
Appendix 3 to Chapter 8
INVESTIGATING PROCEDURAL DEVIATIONS
WITH
PEAT AND MEDA
PEAT33 is designed to provide airlines with a tool for safety and risk management. It aims at enhancing
reliability, consistency and effectiveness of the investigation process. The PEAT form is designed to
facilitate the investigation of events involving non-adherence to procedures. The primary focus is to find
out why a serious event occurred and if a procedural deviation was involved.
The PEAT methodology comprises three elements:
Process. PEAT provides a structured process that guides investigators through the identification of
key contributing factors and the development of effective recommendations aimed at the elimination
of similar errors in future. This includes collecting information about the event, analysing the event
for errors, classifying the error and identifying preliminary recommendations.
Data Storage. To facilitate analysis, PEAT provides a database for the storage of procedurally related
data. Although designed as a structured tool, PEAT also provides the flexibility to analyse narrative
information as needed. The database also facilitates tracking progress in addressing issues revealed by
PEAT analyses and the identification of emerging trends.
Analysis. Using the PEAT tool in a typical analysis of a procedurally related event, a trained
investigator will consider the following areas and assess their significance in contributing to flight
crew decision errors:
a) Flight Phase where the error occurred;
b) Equipment factors;
1) Role of automation;
2) Flight deck indications;
3) Aircraft configuration;
c) Other stimuli (beyond immediate indications);
d) Environmental factors;
e) Procedure from which the error resulted:
1) Status of the procedure;
2) Onboard source of the procedure;
3) Procedural factors (e.g. negative transfer, impractical, complexity);
33
For further information on PEAT, see www.boeing.com/commercial/flighttechservices/ftssafety.html.
8-23
Part 2
Chapter 8 – Safety Investigations
DRAFT
4) Crew interpretation of the relevant procedure;
f) Current policies, guidelines/policies aimed at prevention of event;
g) Crew factors:
1) Crew interpretation of the relevant procedure;
2) Crew intentions;
3) Crew understanding of situation at the time of the procedural deviation;
4) Situational awareness factors (e.g. vigilance, attention, etc.);
5) Factors affecting individual performance (e.g. fatigue, workload, etc.);
6) Personal and corporate stressors, management or peer pressures, etc;
7) Crew coordination/communication; and
8) Technical knowledge/skills/experience.
MEDA34
Maintenance and inspection errors remain a significant cause of aircraft accidents worldwide. The
percentage of commercial aircraft accidents in which maintenance error was a primary cause has risen
steeply in recent years. Again, it is accepted that the same type of factors that contribute to accidents can
be contributory to incidents and lesser safety events. Consequently, aviation authorities are seeking
methods for better managing maintenance errors.
MEDA is designed to systematically investigate the causes of maintenance errors that lead to occurrences
involving equipment damage or personal injury or to unplanned events (such as a return to the gate, a
flight cancellation or delay, a diversion, etc.). MEDA is discussed in greater detail in Chapter 19 (Aircraft
Maintenance).
__________________
34
For further information on MEDA, see www.boeing.com/commercial/flighttechservices/ftssafety.html.
8-24
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Chapter 9
SAFETY ANALYSIS AND STUDIES
Outline
Introduction
 ICAO requirement
 Safety analysis – what is it?
 Analytical thinking
 Reasoning
— Objectivity and bias
— Challenge process
Information sources
Analytical methods and tools
 General methods and tools
— Statistical analyses
— Trend analysis
— Normative comparisons
— Simulation and testing
— Expert panel
— Cost-benefit analysis
 Specialized methods and tools
— Event reporting and analysis systems
— Occurrence investigation and analysis
— Human factors analysis
— Flight data analysis
— Other analytical tools
Analysis process
 Data retrieval
 Sorting the data
 Analysing the data
 Drawing conclusions
 Common analytical errors
Safety studies
 Selecting study issues
 Information gathering
 Participation
Significant Issues Lists (SIL)
Appendices
1. Understanding bias
2. Basics of statistical analysis
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
This page intentionally left blank.
9-2
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Chapter 9
SAFETY ANALYSIS AND STUDIES
INTRODUCTION
Having collected and recorded voluminous safety data through safety investigations and various hazard
identification programmes, meaningful and supportable conclusions for accident prevention can only be
reached through safety analysis. Data reduction to simple statistics in itself serves little useful purpose
without evaluation of the practical significance of the statistics in defining a problem that can be resolved.
Often the expertise of a number of disciplines is required to analyse identified hazards, following the
collaborative and iterative steps of risk management. During the risk assessment phase, the data are
analysed, the probability and severity of risks evaluated, and the degree of acceptability of the risks
determined.
Ideally, this analysis process proceeds with certainty. In reality, facts are often less than “certain”. Some
of the data may be contradictory, other data may be biased or misrepresented and much of the information
needed will be missing. Such problems defy easy resolution.
ICAO requirement35
ICAO recognizes the linkages between sound safety analysis and accident prevention. ICAO promotes
accident prevention by the analysis of accident and incident data and by the prompt exchange of safety
information. Having established an accident and incident database and an incident reporting system,
States are required to analyse the information contained in their accident/incident reports and their
databases to determine any preventive actions required. ICAO also recognizes the value of safety studies
in recommending changes needed for accident prevention.
Safety analysis — what is it?
Analysis is the process of organizing facts using specific methods, tools or techniques. It may be used to:
a) Verify the utility and limitations of available data;
b) Assist in deciding what additional facts are needed;
c) Establish consistency, validity and logic;
d) Ascertain causal and contributory factors;
e) Assist in reaching valid conclusions; etc.
35
See Annex 13 — Aircraft Accident and Incident Investigation.
9-3
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Safety analysis is based on factual information, possibly from several sources. Relevant data must be
collected, sorted and stored in such a manner that it is easily retrievable. Analytical methods and tools
suitable to the analysis are then selected and applied.
Safety analysis may involve the use of mathematical calculations and/or models to simulate particular
events. Safety analysis is often iterative, requiring multiple cycles. It may be quantitative or qualitative.
The absence of quantitative baseline data may force a reliance on more qualitative methods of analysis.
Regardless, credible safety analysis must evaluate all hypotheses and remain objective, free from
conjecture, emotions and politics.
Analytical thinking
Analytical thinking is the systematic way by which order is drawn from all the available information.
Through analytical thinking, conclusions are reached based on data or known information that a particular
event occurred (or might occur). Typically, four activities are involved:
a) Observation (collecting and studying data relevant to the safety problem);
b) Hypothesis (theorizing on any patterns noted in the data);
c) Prediction (concluding what might happen, if the hypothesis under consideration is valid); and
d) Verification (collecting new facts to test the predictions based on the hypothesis).
Reasoning
Credible safety analysis is built upon logical (unemotional) reasoning. The reasoning necessary for the
reconstruction of an accident sequence differs from the kinds of “what if” reasoning used in safety
assessments. Following are some of the reasoning processes typically encountered in safety analysis:
a) Logic. Logic is implicit in all reasoning. It helps us link seemingly unrelated information into
meaningful patterns. For example, logic is used to support conclusions that, given these
circumstances, this event will happen with certainty. Logic is used to determine what conditions
are necessary for an event to occur and to what extent these conditions are sufficient for the event
to occur. For example, fuel is necessary for a fire; however, the actual amount of fuel present in
an occurrence may not have been sufficient to cause a fire of the size which occurred.
b) Deduction. There are two principal logic processes used in every day life: deduction and
induction. Deductive logic goes from the broadly known, down to narrow or specific conclusions.
Specific facts, theories or events are used to explain how various information pieces are tied
together (that is, their interrelationships). In reconstructing the chronology and causes of an
occurrence or event, deductive reasoning is required to move from the generally known to form
increasingly specific conclusions. This process is essential in understanding why and how
something happened.
c) Induction is a process of drawing general conclusions from specific observations or experience.
Induction implies that what happened once will happen again under the same circumstances. For
9-4
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
effective safety analysis, inductive reasoning should be based upon a large body of consistent
information; isolated events may not be sufficient to form a general conclusion. Inductive
conclusions are really just theories which can only be confirmed through practical experience.
d) Hypothesis. Early in the analysis process, hypotheses or scenarios are created to attempt to
connect the common elements of available data. They require imagination. Preconceived notions
are abandoned and as many hypotheses as possible should be considered. No hypothesis can be
accepted as truth until it has been tested against available data and information. A hypothesis
should fit the facts; the urge to make the facts fit the hypothesis must be resisted. As the analysis
proceeds, initial hypotheses may have to be abandoned in view of contrary data and information.
(However, new data may subsequently come to light to support reconsidering a discarded
hypothesis.)
Objectivity and bias
Not all information is equally reliable. It is necessary to treat information with judgement; a high degree
of skepticism is appropriate. An open mind is required to give due consideration to all relevant
information. Notwithstanding the desire to remain objective, time does not always permit the collection
and careful evaluation of sufficient data to ensure objectivity. Intuitive conclusions may be reached which
are not consistent with the objectivity required for credible safety analysis.
We are all subject to some level of bias in our judgement (and therefore, in the weighting and evaluation
of information). Past experience will influence our judgement, as well as our creativity in establishing
hypotheses. One of the most frequent forms of judgement error is known as “confirmation bias”. This is
the tendency to seek and retain information that confirms what we already believe to be true. As soon as
we create a hypothesis, we become susceptible to confirmation bias.
Appendix 1 to this chapter includes a discussion of several concepts of bias that are relevant to the
drawing of questionable conclusions in safety analysis.
Challenge process
Fundamental to strong analytical thinking is a challenge process. For any analysis to be credible, it must
have gone through an introspective process, starting with rigorous review by peers. The conclusions of
safety analyses must stand up to the challenges by managers/specialists/experts. Only if “severe selfcriticism” has been exercised throughout the analytical process will the argument for change stand up to
the rigours of external challenge. Anyone engaged in safety analysis should encourage constructive
criticism as a normal part of their work.
INFORMATION SOURCES
Much information is available from a variety of sources to support effective safety analysis. However,
new problems are continually being introduced by changes in technology, environmental concerns and
operational conditions. Thus the past may not be an ideal indication of future problems.
The following are the more widely used source of safety information:
a) Review of current databases including:
9-5
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
1) Investigation reports of accidents and incidents;
2) Safety database(s) (both internal and external to the organization.) These include the
mandatory occurrence reporting record (e.g. ADREP), service difficulty reports, industry
safety data exchanges (such as STEADES36) etc.);
3) Voluntary incident reports (recognizing that such information is anecdotal and may not
be statistically valid);
4) Operational monitoring systems such as FDA and LOSA; and
5) Engine conditions monitoring programmes.
b) Review of regulations (governing State as well as regulations from other States to help identify
inadequate safety defences. Note that some of these regulations may be out-of-date, incomplete,
ambiguous, and even contradictory);
c) Review of current corporate situation including:
1) Safety reports and committee minutes;
2) Workplace opinions; and
3) Safety assessment and audit reports;
d) Specialist advice from recognized industry experts such as manufacturer's representatives or
academics (e.g. sleep researchers). Their perspective may be invaluable in defining the scope and
potential consequences of a particular hazardous situation; and
e) Literature searches (professional and industry journals and published academic papers) often
provide an appreciation of the underlying causes and effects of particular hazards.
Indeed, there is so much safety-related information available, managers and regulatory officials may face
an over-abundance of information — not all of which is potentially useful. Care is required to select only
data and information that are valid, reliable, relevant, etc. for the purposes of the safety analysis.
ANALYTICAL METHODS AND TOOLS
Having acquired the information needed to support a safety analysis, the choice of available analytical
methods and tools is wide. Some of the methods used in safety analysis are automated, some are not.
Several software-based tools (requiring different levels of expertise for effective application) are also
available. Some of the relevant methods and tools are general, having broad application beyond the
specific needs of aviation safety; and some are unique to aviation.37
36
Administered by IATA.
The website for the Global Aviation Information Network (GAIN) includes descriptions of some of the industry’s best
practices with respect to methods and tools effective in aviation safety analysis (www.gainweb.org).
37
9-6
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
General methods and tools
Some of the available general methods and tools include:
Statistical analyses. Many of the analytical methods and tools used in safety analysis are built upon
statistical procedures and concepts. For example, risk analysis utilizes concepts of statistical
probability. Statistics play a major role in safety analysis by helping to quantify situations, providing
insight through numbers. This generates more credible results for a convincing safety argument.
The type of safety analysis conducted at the level of company safety management systems requires
basic skills for analysing numeric data, identifying trends and for making basic statistical
computations such as arithmetic means, percentiles and medians. Statistical methods are also useful
for graphical presentations of these analyses.
Computers can handle the manipulation of large volumes of data. Most statistical analysis procedures
are available in commercial software packages (such as MS Excel). Using such applications, data can
be entered directly into a pre-programmed procedure. While a detailed understanding of the statistical
theory behind the technique is not necessary, the analyst should understand what the procedure does
and what the results are intended to convey.
While statistics are a powerful tool for safety analysis, they can also be misused, leading to erroneous
conclusions. Care must be taken in the selection and use of the data used in statistical analysis. To
ensure appropriate application of the more complex methods, the assistance of specialists in statistical
analysis may be required. Such specialists may be helpful in the following circumstances:
a) Conducting more complex statistical analytical procedures;
b) Developing sampling techniques;
c) Interpreting statistical outputs particularly when data samples are small;
d) Advising on the use of appropriate normative data;
e) Assisting in the use of specialized databases, extraction and analysis tools;
f) Detecting data corruption;
g) Advising on the use and interpretation of data from external sources, etc.; and
h) Consolidating data, checking its homogeneity and relevance.
A summary of some of the basics of statistical analysis is included in Appendix 2 to this chapter.
Trend analysis. By monitoring trends in safety data, predictions may be made about coming events.
Emerging trends may be indicative of embryonic hazards. Statistical methods can be used to assess
the significance of perceived trends. Upper and lower limits of acceptable performance may be
defined against which to compare current performance. Trend analysis can be used to trigger
“alarms” when performance is about to depart from accepted limits.
Desktop computer analysis software programmes, such as MS Excel have the capability to support
many types of trend analysis (such as linear, exponential and forecasting).
9-7
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Normative comparisons. Sufficient data may not be available to provide a factual basis against which
to compare the circumstances of the event or situation under examination with everyday experience
(which routinely copes with the same conditions). The absence of credible normative data often
compromises the utility of safety analyses. In such cases, it may be necessary to sample real world
experience under similar operating conditions, using such methods as:
a) Direct observation of skilled personnel performing similar tasks under similar conditions, or
observation of like components in service;
b) Sampling through the use of questionnaires, surveys or interviews of employees engaged in
similar activities; and
c) Simulation and testing.
Note: By sampling actual flight operations, FDA and LOSA programmes provide much useful
normative data for the analysis of unsafe practices in flight operations. Both these programmes are
discussed in Part 4 of this manual.
Simulation and testing. In particularly difficult cases, the underlying safety hazard may become more
evident through testing. For example, laboratory testing may be required for analysing material
defects. For suspect operational procedures, simulation in the field under actual operating conditions,
or in a simulator may be warranted.
Expert panel. Given the diverse nature of safety hazards, the variables in assessing the inherent risks,
and the different perspectives possible in evaluating any particular unsafe condition, the views of
others should be sought, including peers and specialists. When a multidisciplinary team is formed to
evaluate the evidence of an unsafe condition, it can also assist in identifying and evaluating the best
course for corrective action. (The safety assessment process described in Chapter 13 depends on
hazard identification by a multi-disciplinary group.)
Cost-benefit analysis. The acceptance of recommended risk control measures may be dependent on
credible cost-benefit analyses. The costs of implementing the proposed measure are weighed against
the expected benefits over time. Indeed, cost-benefit analysis may suggest that accepting the risk is
preferable to the time, effort and cost of implementing corrective action.
Specialized methods and tools
In addition to the methods and tools having general application for safety analysis, specialized methods
and tools have proven their value and are regularly used by the aviation community throughout the world.
GAIN and the OFSH both provide information and guidance on such analytical methods and tools.38 Such
inventories of methods and tools are designed to help safety analysts identify existing computer
programmes and/or methodologies that can be used to turn aviation safety data into safety information
that is suitable for decision-making.
38
Operational Flight Safety Handbook (OFSH), Appendix C and Global Aviation Information Network (GAIN) Working
Group B
9-8
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
The level of complexity and sophistication of these methods and tools varies widely. Some go well
beyond desktop PC skills and capabilities. Thus, the equipment and skill requirements for successfully
applying them vary.
Event reporting and analysis systems include a capability for analysing occurrence data. BASIS
(originally designed as an easy-to-use reporting system for safety events) is a good example of such a
system. Add-on BASIS modules permit analysis beyond the data of the event reporting capability to
include analysing human factors and inflight recordings.39
Occurrence investigation and analysis. Occurrence investigation increasingly employs a variety of
standard analytical methods and tools. By adhering to a systematic methodology, consistency in the
identification of hazards and assessment of risks is enhanced. Some of the tools not only help identify
root causes of occurrences, but also tie-in to organizational occurrence databases thereby facilitating
trend analysis.
Human factors analysis. Linked closely to the methods and tools coming into use for occurrence
investigation and analysis, are specialized methods and tools for better understanding the impact of
human factors on specific occurrences, as well as families of similar occurrences.
Flight data analysis. Given the rapid evolution of FDA (FOQA) programmes, there has been a
parallel development of analytical methods and tools for capitalizing on data recorded during a flight,
permitting trend analysis and the identification of systemic hazards. With the advent of LOSA,
methods and tools are being developed to integrate the data from FDA and LOSA.
Other analytical tools. A number of non-specialized analytical tools are available, however when
pursuing these specialized types of analysis, it would be wise to utilize an experienced analyst:
a) Events and Causal Factor (E&CF) Analysis;
b) Change Analysis;
c) Hazard-Barrier-Target (HBT) Analysis; and
d) Fault Tree Analysis (FTA), etc.
ANALYSIS PROCESS
Regardless of the analytical tools selected, the safety analysis process involves some or all of the
following activities.
Data retrieval
Safety analysis generally involves an examination of many data points. Different search strategies may be
required to get the appropriate data and care is required in determining the search criteria.40 Two types of
errors may occur during data retrieval that may compromise the validity of the safety analysis:
39
40
For further information on BASIS, visit the BASIS website at http://www.winbasis.com/
See Chapter 15 Practical Considerations), for a discussion of data quality.
9-9
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
False positive returns may appear appropriate but on further examination are found to be irrelevant;
and
False negative returns are relevant data that are not identified during the retrieval process.
Sorting the data
The data retrieved must be sorted and arranged in a way that will support the analysis. Irrelevant data
(such as false positives) need to be rejected. Judgements are thus made as to the adequacy of the
remaining data. If necessary, additional information is sought, perhaps using different methods or tools.
Analysing the data
Having retrieved and compiled the requisite data, the most appropriate analytical methods and tools and
analytical thought processes are applied.
Safety analysis is often an iterative process, sometimes requiring new or additional data or the selection of
alternative methods or tools. Hazards are sometimes not visible unless viewed in a particular way. Many
cross sections through the data may be required, examining the data from different perspectives. For
example, it may not be possible to uncover a safety problem when looking at all recorded events, but a
distribution by location may reveal an unacceptable number of occurrences at a particular location, while
the other locations have a good safety record.
Drawing conclusions
The analyst begins to form impressions from the outset. Preliminary findings are made as to what is
known with certainty, the relationships between facts, areas where insufficient or inadequate data have
been found, etc. Care must be taken to avoid jumping to premature conclusions based on strongly held
impressions or beliefs. The analyst should seek to maximize certainty by ensuring an objective and
comprehensive understanding of the facts. Preliminary conclusions must withstand the test of such
questions as why and so what.
Caution is advised when relying largely on occurrence data. Safety databases generally contain only
reported information; little data is available on events which were not reported. For example, the database
may hold more traffic conflicts that occurred at airports operating air traffic control towers than at airports
with no tower. This does not imply that the towers are the cause of the conflicts. Airports with towers
generally have more traffic, providing more opportunities for aircraft to come too close to each other.
Also, the controller and/or pilots are more likely to report such occurrences.
Common analytical errors
The following are some common pitfalls in analytical thinking:
a) Basing a general conclusion on too little evidence, e.g. using words such as "always" or "never";
b) Drawing premature conclusions from insufficient data;
9-10
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
c) Stacking the evidence by withholding facts so that the evidence points to only one conclusion;
d) Linking two events as if one caused the other when the relationship between them may be more
complex;
e) Assuming that, because one event follows another, it was caused by the first event;
f) Assuming that a complicated question has only two possible answers;
g) Drawing a conclusion that bears no logical relation to the facts (the “apples and oranges”
argument);
h) Suggesting that, because two things or situations share some similarities, they must be alike in
other ways; and
i)
Relying on an expert for issues beyond that person’s qualifications and experience.
SAFETY STUDIES
Some complex or pervasive safety issues can best be understood through an examination in the broadest
possible context. At the top of the aviation safety spectrum, systemic safety concerns may be addressed
on an industry-wide or a global scale. For example, the industry collectively has been concerned with the
frequency and severity of approach and landing accidents and has undertaken major studies, made many
safety recommendations and implemented global measures to reduce the risks of accidents during the
critical approach and landing phases of flight.
The convincing argument necessary to achieve large or systemic changes requires significant data,
information, analysis and effective communication. Safety argument based on isolated occurrences and
anecdotal information will fall on deaf ears. Data from many sources must be coherently integrated and
synthesized to permit drawing valid conclusions.
For the purposes of this Manual, these larger, more complex safety analyses are referred to as Safety
Studies. The term includes many types of studies and analysis conducted by State authorities, airlines,
manufacturers, and professional and industry associations. ICAO recognizes that safety recommendations
may arise not only from the investigation of accidents and serious incidents, but also from safety
studies.41 Safety studies have application in hazard identification and analysis relating to issues arising
from flight operations, maintenance, cabin safety, air traffic control, airport operations, etc.
Safety studies of industry-wide concerns generally require a major sponsor. The Flight Safety Foundation,
in collaboration with major aircraft manufacturers, ICAO, NASA and other key industry stakeholders has
taken a leading role in many such studies. Civil aviation authorities of specific States have also conducted
major safety studies, many identifying safety risks of global interest. Several State authorities have also
used safety studies for identifying and resolving hazards in their national aviation systems. Although, it is
unlikely that small or medium-sized operators would undertake a major safety study, large operators and
regulatory officials may well be involved in identifying widespread systemic safety issues through such
means.
41
See Annex 13, Chapter 8
9-11
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Consistent with normal safety management processes, safety studies and safety surveys typically include
the following steps:
a) Information gathering;
b) Recording of pertinent data;
c) Preliminary analysis and hazard identification;
d) Risk assessment, including prioritization of risks;
e) Development of risk control strategies;
f) Implementation of preferred risk control options; and
g) Monitoring and evaluation to determine the effectiveness of the actions taken, the residual risks
and any further action required.
Selecting study issues
Large operators, manufacturers, safety organizations and regulatory authorities may maintain a list of
significant safety issues. (The topic of maintaining a Significant Issues List (SIL) is described later in this
Chapter.) Such lists may be based on the accident and incident record in such areas as runway incursions,
ground proximity warnings, TCAS advisories, etc. These issues may be prioritized in terms of the risks to
the organization or the industry.
Given the degree of collaboration and information-sharing necessary to conduct an effective safety study,
issues selected for study must enjoy a broad base of support among participants and contributors.
Information gathering
The information sources cited earlier with respect to safety analysis are equally applicable for safety
studies. Several additional methods (outlined below) are available for acquiring the relevant information
necessary to support the broad-based analysis of a safety study.
Review of occurrence records. Investigated occurrences may be reviewed by selecting those
occurrences which meet some pre-defined characteristics such as Controlled Flight Into Terrain, cabin
fires or crew fatigue. By reviewing all available material on file, specific elements may be identified
that are suitable for further analysis.
Structured interviews. Much useful information can be acquired through structured interviews. Care
must be taken in selecting the interviewees to avoid biasing the results. Starting from a series of
carefully crafted and sequenced questions, the interviewer can probe particular aspects as deeply as
warranted. Structured interviews depend very much on trust between the parties and must be
conducted in a non-punitive environment. While they can be time-consuming, such interviews offer
the potential for acquiring quality information, even though it may not be a statistically representative
sample. Success with structured interviews will depend on the quality of the questions and follow-up
questions, and the ability of the analyst to reduce much anecdotal information to useful data for
analytical purposes.
9-12
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Directed field investigations of relatively insignificant occurrences (which might normally not be
investigated) may uncover sufficient additional information to permit a deeper analysis than that
obtained from mandatory reporting systems. By investigating a sample of like-occurrences over a
defined period, specific information can be collected in a structured way. Although few of these
investigations when considered individually would have contributed much to the collective
knowledge of the factors contributing to such occurrences, collectively they may reveal behavioural
patterns which are compromising safe operations. Field investigations, supplemented by some of the
other methods described here, provide an efficient means for validating systemic risks.
Literature search. Whether the safety issues under examination have to do with particular equipment,
technology, maintenance, human performance, environmental factors, or organizational and
management issues, undoubtedly much has already been written on the subject by “experts”. Safety
analysts conducting major studies require library search skills to locate and review the most
compelling works written by scholars and other subject matter experts. Careful use of the Internet
provides a vast source of knowledge. Prior to commencing a safety study, it may be appropriate to
carry out a literature search on the issue under consideration. This search may be useful in deciding if
any further action is required; specifically, is it a safety issue to which meaningful value might be
added.
Experts’ testimony. Direct contact with recognized subject matter experts may be warranted. For
example, if crew fatigue is the issue, it may be appropriate to consult directly with those conducting
scientific sleep research. Such experts may be contacted informally over the Internet or telephone;
they may also be invited to provide more formal input through submissions to a hearing or public
inquiry.
Public inquiries. For major safety issues that must be considered from many perspectives, State
authorities may convene some form of public inquiry. These provide all stakeholders, individually or
as representatives of particular interest groups, an opportunity to present their views in an open,
impartial process. However, prior to deciding upon convening a public inquiry, considerable
preparatory work is required to demonstrate the need for such an expenditure of resources. Care must
be taken in drafting the terms of reference for any public inquiry to ensure that the focus is on
identifying safety issues – not identifying persons to blame.
Hearings. Less formal meetings (than public inquiries) may be convened with a view to hearing the
different (and often divergent) views of the major aviation stakeholders, perhaps involving unions or
professional associations, regulatory authorities, operators’ and industry associations. As opposed to a
public inquiry, the stakeholders are heard in camera (or private); in this way, they may be more
candid in stating their positions.
Participation
To maximize the benefits from a safety study, the widest possible participation is advisable. Inputs from a
broad cross section of interest groups will ensure that all perspectives are considered. The solution to one
interest group’s problem may create problems and hazards for another. Similarly, the information
gathered should be examined from a multidisciplinary perspective.
9-13
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
SIGNIFICANT SAFETY ISSUES LIST (SIL)
Some State regulatory authorities, investigative agencies and large operators have found that maintaining
a list of high priority safety issues is an effective means for highlighting areas warranting further study
and analysis. These lists are sometimes referred to as the “Top Ten” or the “Most Wanted” lists. Such
lists prioritize those safety issues that put the aviation system (or the organization) at risk. As such, they
may be useful in identifying issues for safety assessment, for safety survey or safety study. If SILs are to
be of value in guiding the work of those involved in safety management, they must not chronicle every
perceived hazard. Thus, SILs should be limited to not more than ten issues.
Typical issues that may warrant inclusion on an SIL might include:
a) Frequency of GPWS warnings;
b) Frequency of TCAS advisories;
c) Runway incursions;
d) Altitude deviations (busts);
e) Call sign confusion;
f) Unstabilized approaches; and
g) Air proximities (near misses) at selected aerodromes, etc.
SILs should be reviewed and updated annually, adding new high-risk issues and deleting lesser risk
issues.
————————
9-14
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Appendix 1 to Chapter 9
UNDERSTANDING BIAS42
Everyone’s judgement is shaped by his or her personal experience. Notwithstanding the quest for
objectivity, time does not always permit the collection and careful evaluation of sufficient data to ensure
objectivity. Based on a lifetime of personal experiences, we all develop mental models that generally
serve us well in quickly evaluating everyday situations, “intuitively” without a complete set of facts.
Unfortunately, many of these mental models reflect personal bias. Bias is the tendency to apply a
particular response regardless of the situation. Following are some of the basic biases that can affect the
validity of safety analyses:
Frequency bias: We tend to over- or under-estimate the probability of occurrence of a particular
event because our evaluation is based solely on our personal experience. We assume that our limited
experience is representative of the global situation.
Selectivity bias: Our personal preferences give us a tendency to select items based on a restricted core
of facts. We have a tendency to ignore those facts which do not quite fit the pattern we expect. We
may focus our attention on physically important characteristics, or obvious evidence (e.g. loud,
bright, recent) and ignore cues that might provide more relevant information about the nature of the
situation.
Familiarity bias: In any given situation, we tend to choose the most familiar solutions and patterns.
Those facts and processes which match our own mental models (or preconceived notions) are more
easily assimilated. We tend to do things in accordance with the patterns of our previous experience,
even if they are not the optimum solutions for the current situation, e.g. the route we pick to go
somewhere may not always be the most efficient under changing circumstances.
Experience can be valuable in helping us focus our attention on those things which are most likely to
be problematic, but we should recognize that in following these familiar patterns we may be
overlooking critical information. Today, management gurus exhort us to “think outside the box”.
Conformity bias: We have a tendency to look for results which support our decision rather than
information which would contradict it. As the strength of our mental model increases, we are
reluctant to accept facts which do not line up nicely with what we already Aknow”. Time pressures
can lead to erroneous assumptions that do not accurately reflect the current reality.
We tend to seek information that will confirm what we already believe to be true. Information that is
inconsistent with our chosen hypothesis is then ignored or discounted.
A frequently cited causal factor in aviation accidents is “expectancy”, i.e. individuals see what they
want to, or expect to see, and they hear what they want to, or expect to hear. Expectancy is a form of
conformity bias.
Group conformity or ‘Group think’: A variation on conformity bias is “group think”. Most of us
have a tendency to agree with majority decisions; we yield to group pressures to bring our own
thinking in line with that of the group. We do not want to break the group’s harmony by upsetting the
prevalent mental model. In the interests of expediency, it is a natural pattern to fall into.
42
Adapted from Human Factors Guidelines for Safety Audits Manual (Doc 9806)
9-15
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Overconfidence bias. There is a tendency for people to overestimate their knowledge of the situation
and its outcome. The result is that attention is placed only on information that supports their choice
and ignores contradictory evidence. The defining characteristic of an overconfidence bias is that
attention is given to certain information because an individual overrates his/her knowledge of the
actual situation. Without the tempering afforded by on-the-job experiences, an inexperienced
individual may overrate the value of "classroom" theory versus the more "work-shop"-oriented
knowledge used by peers. On the other hand, more seasoned personnel may also let overconfidence
bias affect their judgement, having “seen it all before”.
————————
9-16
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Appendix 2 to Chapter 9
BASICS OF STATISTICAL ANALYSIS
Purpose
This précis provides supplemental material in support of statistical analysis for safety management.
Data collection
To begin an analysis, normally a cursory review of overall counts of the available data and simple crosstabulations of the data are conducted. However, the data that is readily available through routine
databases may be insufficient to carry out a credible analysis or risk assessment. Specific additional data
may be required. Two methods for acquiring the extra data are:
Sampling is one way of acquiring sufficient information for a valid analysis. If a sample of data
representative of the larger population of data is compiled, credible conclusions may be drawn from
the statistical analysis. Some of the considerations to be taken into account in preparing a valid
sample include:
a) Size of the sample (the larger the sample size, the more precise the conclusion);
b) Selection method (e.g. random samples produce unbiased estimates);
c) Representativeness of data (to avoid comparing apples with oranges);
d) Homogeneity of sample data (in terms of data definitions, time-frames, operating conditions,
etc.); and
e) Completeness of data (sufficient to be truly representative of the full population).
Because of the resource implications, the sampling process must be limited to only that data
necessary to support the scope of the analysis. It may be limited in terms of:
a) Duration (e.g. collect data for one year);
b) Area (e.g. limit sample to a particular location or fleet); and
c) Data type (e.g. use readily obtainable numeric data vs. deriving numbers from narratives).
If the sample is a valid representation of population, the statistical analysis should yield credible
conclusions on all like-occurrences at a much lower cost than observing or testing every item in the
population.
Surveys. One way of sampling a large population is through surveys or questionnaires. In addition to
the sampling principles described above, some other considerations for survey design include:
a) Population to be surveyed? How are they to be selected?
9-17
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
b) Sample size (vs. accuracy requirements)?
c) When? (not after an event that could influence the results);
d) Sampling method? (Contacting participants? Format (oral/written)? Tic box vs. narrative?)
e) Questions to be asked? (Careful formulation of questions is necessary to get the information
really wanted, with non-ambiguous answers).
Surveys are also a useful means for the collection of normative data (which otherwise might not be
available).
Describing and presenting data
Quantitative information must be presented in a way suitable for facilitating analysis and promoting
understanding. Tables are usually the first step in organizing data and graphs provide the most vivid
presentation of the overall picture. However, the more specific aspects of the data contained therein such
as their averages, variability, and interrelationships are most succinctly summarized by appropriately
chosen numerical measures (such as means and medians).
Tabulation
Tables are most frequently used to organize data and present numeric information, cross tabulating
different variables. For example, events of a particular type by month, or by location, compared to
similar data for preceding years. Tables are suitable for summarizing data, identifying trends and
conveying analytical conclusions. Tables can also be used for examining single variables (incidents
vs. age of fleet) or multiple variables (incidents vs. age of fleet by year and by phase of flight).
The following considerations apply when preparing a table:
Orientation: A well-chosen title, row and column headings supplemented by explanatory
information and footnotes help to orient the reader. Assumptions and recognized anomalies in the
data should be provided to avoid incorrect conclusions. There should also be an indication of the
data sources.
Analysis: The overall features of the table should be examined to confirm that the table works.
For example, an appropriate measuring system and units have been used, overall sums/averages
are reasonable, and there is consistency between summary rows and columns (e.g. their
association follows logically and their variability is credible).
Graphical presentations
Graphs or charts summarize tabular data in a way that facilitates visualizing the relationship between
the variables. Graphs reveal facts about data that would otherwise require careful study to detect in a
table. It is important that the initial impression portrayed by a graph be an accurate impression. Here
too, some basic principles have to be kept in mind in their preparation to avoid misleading
conclusions. For example:
9-18
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
a) Omission of the line representing zero units on the vertical scale can significantly magnify a
change; if it must be omitted it should be clearly illustrated (perhaps by a jagged break).
b) Perspective diagrams (e.g. three-dimensional) can present a distorted representation of any
fluctuations.
c) Disproportionate scales and X/Y axes can exaggerate trends or relative changes.
d) Inappropriate labelling of scales, legends and titles can compromise the graph's credibility.
Many types of graphs can be used in presenting numeric data, depending on the nature of the data and
the purpose intended.
a) Line graphs are useful for illustrating time trends, with time on the horizontal axis.
b) Bar graphs allow several variables to be compared over time or against one another.
c) Scatter plots are used to graph data with two variables, the independent variable being on the
horizontal axis. These are sometimes used to develop a simple trend for activity data to get an
approximate forecast of an accident rate.
d) Pie charts describe relative proportions of the components of a data set.
e) Histograms, which look like bar charts, depict the distribution of data.
Arithmetic means and medians
Having organized and presented the data in a useful format, it is also important to understand their
relationships.
Arithmetic mean. The most commonly used numeric relationship is the arithmetic mean
(commonly referred to as the average). The arithmetic mean represents a complete set of data
through a single value. When data are arranged according to magnitude, the average lies
somewhere near the centre of the data set; therefore averages are often referred to as a measure of
centre. The median (discussed below) is also a measure of centre.
Medians are useful for ranking variables, for example organizing people by age. Medians do not
employ the actual numerical value of the data. Rather, the median of a set of numbers is that
number located at the midpoint when they are arranged in increasing order. For example, suppose
that the age of pilots involved in accidents ranges between 25 and 60 years. The mean age may be
computed to be 45, which might suggest that older pilots have more accidents. However, the
median age, may be only 30, which confirms that half the pilots in accidents are actually under
30: the mean was influenced by a few pilots who were perhaps closer to 60.
Measures of variation of data
Measures of centre are usually incomplete and misleading without some accompanying indication of
how spread-out the data are. The degree to which data tend to spread about an average value is called
the variation or dispersion of the data. As the example on the mean age of pilots above illustrated,
use of it alone lumps the minimum and maximum extreme cases together. Measures of dispersion
include percentiles, variance and standard deviation.
9-19
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
When using the median to measure centre, use is made of percentiles to indicate the variability or
spread. The nth percentile of a data set is a value such that n percent of the observations fall below it.
The median is therefore the 50th percentile; the 90th percentile would include the first 90 out of 100
ranked pieces of data.
Help from statistical specialists may be required to better understand what the variation in a particular
data set means.
Time series and trend analysis
A set of observations or measurements of the same variable made at different times, usually at equal
intervals, is called a time series. They can be represented pictorially by constructing a graph of the
variable versus time; for example, the number of incident reports received per month.
When the frequency of a particular event is measured over time, it is normal to note a sequential
increase or decrease in a run. The question is whether or not this run (up or down) is significant?
There are many formulas for evaluating runs; one such rule suggests that a run of seven or more
sequentially increasing or sequentially decreasing points be further analyzed. On the other hand,
shorter runs may also be informative, suggesting that somehow the system is compensating for the
change and returning to more “normal” levels.
Making valid comparisons
For useful conclusions to be drawn from statistics it requires the valid comparisons of the various data.
For example, in comparing accident and incident statistics, more valid comparisons are obtained using
rate information than raw numbers of occurrences. If two types of aircraft are compared and type A flies
one million hours in one year resulting in one accident and type B flies five million hours in a year
resulting in five accidents, the accident rate based on hours flown is the same for both types (one accident
per one million flying hours).
Rates express the numerical proportions of two sets of data. In accident statistics this usually involves
numbers of accidents, incidents, injuries or damage as one set and some measure of exposure, such as
flying hours, or numbers of flights as the other. Generally, rates are suited to establishing the general
measure of safety of an operation, rather than evaluating specific prevention measures. For a rate to be
valid, the sets of data and time frame used must be compatible. For example, long and short haul
operators do not fly the same numbers of flights. Their flights are usually of different duration, using
aircraft that may have widely varying performance capabilities, and carrying differing numbers of
passengers. Therefore, a comparison of the relative safety of these two types of operation will largely
depend on whether the exposure base includes numbers of flights, flying hours, miles, passengers or some
combination of these.
Statistics must be used with caution. For instance, they may show that pilots in a certain age group or with
a certain number of flying hours have the most accidents. Arithmetically, these figures are correct;
however, they imply that the pilots in these groups are “less safe” than pilots in other groups. Before such
a determination can be made, it is necessary to determine the total number of pilots in each age group,
since the pilot group with the highest number of accidents may also contain the greatest number of pilots.
This principle needs to be considered whenever comparisons are made. Many accident statistics are of
little value for safety management because they provide no valid means of comparison.
9-20
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
Traditionally, airline safety statistics have used seat-kilometres (or miles) as an exposure base. With
wide-bodied aircraft the number of available seats has increased significantly although the passenger
loading can vary considerably. Also, the longer range of such aircraft means that they spend a greater
proportion of each flight in the cruise phase. Thus seat kilometres are not a particularly useful basis for
measuring safety. Most accidents occur during the landing and take-off phases and this is a constant for
all flights, irrespective of other factors. In addition, any flight that results in an accident can be seen to
represent a failure of the safety process, irrespective of how long it has been airborne, how far it has
flown, or how many seats it has. Accordingly, when comparing safety levels of airline operations, the rate
of accidents to numbers of flights (or departures) may be a more appropriate measure than the use of
flight hours or seat-kilometres.
For most general aviation operations (which normally have shorter flights), the number of accidents or
incidents and an exposure base measured in flight hours are often used. The resulting rate is then an
expression of accidents or incidents per 1 000 000 hours, 100 000 hours, or 10 000 hours. The base
chosen will depend on the amount of aviation activity being considered in order to provide the resulting
rate as an easily handled number. In special operations involving unique hazards, it may be desirable to
use a different exposure base. For example, aerial application operations usually involve many flights per
hour. The large number of take-offs and landings substantially increase the likelihood of an accident
during these critical phases of flight. For such operations it may be more useful to compare numbers of
accidents with numbers of flights.
When considering safety trends, the current record is often compared against a base period to determine
whether there has been an improvement or decline in safety. This analysis method can also provide a
useful hazard alerting technique. However, when the numbers of accidents/incidents are comparatively
small, slight changes in numbers, for instance from one year to another, can provide an erratic and
virtually meaningless result. To overcome this, some form of averaging can be used. For instance, the
number of accidents in the subject year can be compared to the average number of accidents in a
preceding three or five-year period. Alternatively, the number of accidents in the subject year can be
added to the number of accidents of the previous two or four years, from which a so-called rolling three or
five year average is calculated.
While the foregoing methods may give an indication of the safety trends of a particular organization, or
operation, it has to be remembered that accidents are infrequent and random events. Thus caution should
be applied when using such methods as an indication of relative safety.
Common pitfalls
As will be apparent, there are many potential pitfalls in the use of statistics to reach meaningful
conclusions. The following outlines a few:
Understanding of numbers. Numbers can be misleading unless they are used with appropriate care
concerning their accuracy, their associated units and the manipulation they are subjected to during the
analysis process. Following are examples of common weaknesses in the use of numbers that may
compromise the credibility of the statistical analysis:
a) Absence of a unit of measure to give the number meaning;
b) Manipulation of numbers with inconsistent units of measure;
9-21
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
c) Using unsupportable levels of accuracy (especially when using data from more than one
source);
d) Inconsistency in rounding-off the level of significance of numbers; and
e) Making extrapolations beyond the confidence limits of the available data.
Rules of accuracy. Understanding can be enhanced by respecting some basic rules of accuracy in the
use of numbers. For example:
a) Quote numbers that can be understood (e.g. avoid numbers with exponential powers if
possible);
b) Indicate the measurement unit (e.g. accidents per 100 000 departures);
c) Ensure consistency in the type of measurement used (e.g. metric vs. imperial); and
d) Round data when appropriate (e.g. 5 deaths is more meaningful than 5.21).
Significant figures. Misunderstanding in the use of numbers can arise from the misuse of significant
figures. Differentiation needs be made between what may be precisely measured to many significant
figures and the message that is conveyed. Common sense is often the best guide. For example,
although it may be determined that the aircraft had 19,989.5 litres of fuel on board, for safety
argument sake, is this not the same as 20,000. If there have been only four occurrences of a particular
type, two of which were fatal, is it meaningful to imply great precision by saying that 50.00 % of such
occurrences are fatal. Likewise, a numeric drop in CFIT accidents from 10 to 8 (20%) should not be
described as significant if the activity rate declined proportionately.
Different data sources. Not all safety databases are the same. Each database possesses unique
characteristics. When using data from different sources (such as other organizations, manufacturers or
States) caution is advised. Failure to take the following issues into consideration may result in
“comparing apples with oranges”.
a) Terminology. Differences in definitions for occurrence type, event or causal and contributory
factors, (for example, near-collision, non-fatal accident).
b) Storage. Differences in the capture or updating of data;
c) Requirements. Differences in reporting requirements;
d) Operations. Differences in the size of the infrastructure, number of employees, the volume of
traffic, maintenance and safety programmes, etc.;
e) Traffic type. Short haul vs. long haul, passenger vs. freight, etc.; and
f) Equipment. Age and extent of modernization of the fleet.
Trend analysis. Incorrect conclusions can easily be drawn in examining data covering an extended
time period. These include:
9-22
Part 2
Chapter 9 – Safety Analysis and Studies
DRAFT
a) A comparison may be made between annual totals several years apart. The base year for
calculating the change may be a "blip" (with an unusually low or high frequency of the event)
thereby giving a false impression of an increase or decrease.
b) Comparisons may be made between two time periods to show the effect of a change (in the
operating environment). The result will be meaningless unless the periods chosen adequately
reflect the different operating environments. For example, if the reporting rules for gathering
the data have changed over the period, compensation must be made.
c) In computing averages over a number of years, the individual data points within the time span
should be examined to adjust for factors contributing to data points well outside the expected
range (e.g. a labour-management dispute, an unusually high number of casualties from a
single accident, etc.).
d) In quoting an annual average over a recent number of years, the inclusion of a recent random
increase or decrease in the calculation may result in misleading figures.
e) When describing a trend over a long time period, it may be misleading to ignore large
fluctuations particularly in the opposite direction – even though the general trend is clear.
___________________
9-23
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Chapter 10
SAFETY PERFORMANCE MONITORING
Outline
Introduction
Safety health
 Assessing safety health
— Symptoms of poor safety health
— Indicators of improving safety health
— Statistical safety performance indicators
— Minimum levels of safety
Safety oversight
 Inspections
 Surveys
 Safety monitoring
 Quality assurance
 Safety audits
ICAO Universal Safety Oversight Audit Programme (USOAP)
Regulatory safety audits
Self audit
Safety programme review
Appendices
1. Sample indicators of safety health
2. Airline management self-audit
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
This page intentionally left blank
10-2
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Chapter 10
SAFETY PERFORMANCE MONITORING
INTRODUCTION
Safety management systems require feedback on safety performance to complete the closed loop of the
safety management cycle. Through such feedback, system performance can be evaluated and any
necessary changes effected. In addition, all stakeholders require an indication of the level of safety within
an organization. For example:
a) Staff may need confidence in their organization’s ability to provide a safe work environment;
b) Line management requires feedback on safety performance to assist in the allocation of resources
between the often-conflicting goals of production and accident prevention;
c) Passengers are concerned with their own mortality;
d) Senior management seeks to protect the corporate image (and market share); and
e) Shareholders wish to protect their investment, etc.
Although the stakeholders in an organization’s safety process want feedback, their individual perspectives
as to “what is safe?” vary considerably. Deciding what reliable indicators there are of acceptable safety
performance depends largely upon how one views “safety”. For example:
a) Senior management may seek the unrealistic goal of “zero accidents”. Unfortunately, as long as
aviation involves risk, there will be accidents, even though the accident rate may be very low.
b) Regulatory requirements normally define minimum “safe” operating parameters; for example,
cloud base and flight visibility limitations. Operations within these parameters contribute to
“safety”, however, they do not guarantee it.
c) Statistical measures are often used to indicate a level of safety, e.g. the number of accidents per
hundred thousand hours, or fatalities per thousand sectors flown. Such quantitative indicators
mean little by themselves but they are useful in assessing whether safety is getting better or worse
over time.
SAFETY HEALTH
Recognizing the complex interactions affecting safety and the difficulty of defining what is safe and what
is not, some safety experts are referring to the “safety health” of an organization. The term safety health is
an indication of an organization’s resistance to unexpected conditions or acts by individuals. It reflects the
systemic measures put in place by the organization to defend against the unknown. Further, it is an
indication of the organization’s ability to adapt to the unknown. In effect, it reflects the safety culture of
the organization.
10-3
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
sis
Fl
i
Pr ght
og Da
ra ta
m A
m na
e ly
Sa
Re fety
im por Inc
pl tin id
em g en
en Sy t
te ste
d m
pe
ra
O t in
g
ex per C
pe at er
rie ion tific
nc al at
e an e g
M bu d m ra
er ild a nt
ge s na ed
ra
ge
m
nd
en
la
t
bo
N
ur
ew
di
sp
m
an
ut
e
ag
em
Sa
en
t
Sy fety
st M
em a
n
ad age
op m
te e n
d t
O
Safety Health
(Resistance to Misadventure)
Although the absence of safety-related events (accidents, incidents) does not necessarily indicate a “safe”
operation, some operations are considered to be “safer” than others. Safety deals with risk reduction to an
acceptable (or at least tolerable) level. The level of safety in an organization is unlikely to be static and
will vary over time. As an organization adds defences against safety hazards, its safety health may be
considered to be improving. However, various factors (hazards) may compromise that safety health,
requiring additional measures to strengthen the organization’s resistance to misadventure. The concept of
the safety health of an organization varying during its life cycle is depicted in Figure 10 -1 below.
Healthy Zone
Regulatory Compliance
(minimum acceptable level)
Unhealthy Zone
Time
Figure 10–1. Variation in safety health
Assessing safety health
In principle, the characteristics and safety performance of the “safest” organizations can be identified.
These characteristics, which reflect industry’s best practices, can serve as benchmarks for assessing safety
performance.
Symptoms of poor safety health
Poor safety health may be indicated by symptoms that put elements of the organization at risk.
Appendix 1 to this Chapter provides examples of symptoms which may be indicative of poor safety
health. Weakness in any one area may be tolerable, however, many symptoms indicate serious
systemic risks compromising the safety health of the organization.
Indicators of improving safety health
Appendix 1 also provides indications of improving safety health. These reflect the industry’s “best
practices” and a good safety culture. Organizations with the best safety records tend to “maintain or
improve their safety fitness” by implementing measures to increase their resistance to the unforeseen.
They consistently go beyond the minimum regulatory requirements.
10-4
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Identifying such symptoms may provide a valid impression of an organization’s safety health. However,
they may lack the depth necessary for effective decision-making. Additional tools are required for
measuring safety performance in a systematic and convincing way.
Statistical safety performance indicators
Statistical safety performance indicators illustrate historic safety achievement; they provide a “snap
shot” of past events. Presented either numerically or graphically, they provide a simple, easily
understood indication of the level of safety in a given aviation sector, in terms of the number or rate
of accidents, incidents or casualties over a given time frame. At the highest level, this could be the
number of fatal accidents per year over the past ten years. At a lower (more specific level), the safety
performance indicators might include such factors as the rate of specific technical events (e.g. losses
of separation, engine shut-downs, TCAS advisories, etc.).
Statistical safety performance indicators can be focused on specific areas of the operation to monitor
safety achievement or identifying areas of interest. This “retrospective” approach is useful in trend
analysis, hazard identification, risk assessment and even the choice of risk control measures.
Since accidents (and serious incidents) are relatively random and rare events in aviation, assessing
safety health based solely on statistical safety performance indicators may not provide a valid
predictor of safety performance, especially in the absence of reliable exposure data. Looking
backwards does little to assist organizations in their quest to be proactive, putting in place those
systems most likely to protect against the unknown. The safest organizations employ additional
means for assessing safety performance in their operations.
Minimum levels of safety
Aviation organizations must meet regulatory requirements to ensure minimum acceptable levels of
safety. Organizations that just meet these minimal requirements may not really be healthy from a
safety point of view, but they have reduced their vulnerabilities to the unsafe acts and conditions most
conducive to accidents. In short, they have taken minimum precautionary measures.
Weak organizations that fail to meet the minimum standards will be removed from the aviation
system; either proactively by the regulator removing their operating permit, or reactively, in response
to commercial pressures, such as the high cost of accidents or serious incidents, or consumer
resistance.
SAFETY OVERSIGHT
One of the three cornerstones for an effective Safety Management System is a formal system for safety
oversight. Safety oversight involves regular (if not continuous) monitoring of all aspects of an
organization’s operations. On the surface, safety oversight demonstrates compliance with State and
company rules, regulations, standards, procedures, etc. However, its value goes much deeper. Monitoring
provides another method for proactive hazard identification, validation of the effectiveness of safety
actions taken and continuing evaluation of safety performance.
Safety oversight can be conducted at the company, the State (or regulatory level) or at the international
level. The “monitoring” functions of safety oversight take many forms with varying degrees of formality.
10-5
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
International level. At the international level, the ICAO Universal Safety Oversight Audit
Programme (USOAP) (described later in this Chapter) monitors the safety performance of all
contracting States. International organizations like IATA are also engaging in the safety oversight of
airlines through an audit programme.
State level. At the State level, effective safety oversight can be maintained through a mix of some of
the following elements:
a) No-notice inspections to sample the actual performance of various aspects of the national
aviation system;
b) Formal (scheduled) inspections which follow a protocol which is clearly understood by the
organization being inspected;
c) Discouraging non-compliant behaviour through enforcement actions (sanctions or fines);
d) Monitoring quality of performance associated with all licensing and certification applications;
e) Tracking the safety performance of the various sectors of the industry;
f) Responding to occasions warranting extra safety vigilance (such as major labour disputes,
airline bankruptcies, rapid expansion or contraction of activity, etc.); and
g) Conducting formal safety oversight audits of airlines or service providers such as air traffic
control, approved maintenance organizations, training centres or airport authorities, etc.
Organizational level. The size and complexity of the organization will determine the best methods for
establishing and maintaining an effective safety oversight programme. Companies with adequate
safety oversight employ some or all of the following methods:
a) First-line supervisors maintain vigilance (from a safety perspective) by monitoring day-today activities;
b) Regularly conduct inspections (formal or informal) of day-to-day activities in all safety
critical areas;
c) Sample employee views on safety (from both a general and a specific point of view) through
safety surveys;
d) Systematic review and follow-up on all reports of identified safety issues;
e) Systematic capture of data which reflect actual day-to-day performance (such as FDA and
LOSA);
f) Conduct macro-analyses of safety performance (safety studies);
g) Conduct a regular operational audit programme (including both internally and externally
conducted safety audits); and
h) Attention to communicating safety results to all affected personnel, etc.
10-6
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Safety oversight at the company level essentially includes oversight at the ‘individual level’.
Inspections
Perhaps the simplest form of safety oversight involves the Safety Manager (SM) carrying out informal
“walk-arounds” of all operational areas of the company. Talking to workers and supervisors, witnessing
actual work practices, etc. in a non-structured way provides the SM with valuable insights into safety
performance “at the coal face”. As the SM’s frequent presence becomes the norm, a level of trust can be
established. The resulting feedback should help in fine-tuning the safety management system.
To be of value to the organization, the focus of an inspection should be on the quality of the “end
product”. Unfortunately, many inspections simply follow a tick-box format. These may be useful for
verifying compliance with particular requirements, but are less effective for assessing systemic safety
risks. Rather than a tick-box format, a checklist can be used as a guide to help ensure that parts of the
operation are not overlooked.
Management and line supervisors may also conduct safety inspections to assess adherence to
organizational requirements, plans and procedures. However, such inspections may only provide a spot
check of the operations, with little potential for systemic safety oversight.
Surveys43
Surveys of operations and facilities can provide management with an indication of the levels of safety and
efficiency within its organization. Understanding the systemic hazards and inherent risks associated with
everyday activities allows an organization to minimize unsafe acts and respond proactively by improving
the processes, conditions and other systemic issues that lead to unsafe acts. Safety surveys are one way to
systematically examine particular organizational elements or the processes used to perform a specific
operation — either generally, or from a particular safety perspective. They are particularly useful in
assessing attitudes of selected populations, say line-pilots for a particular aircraft type or air traffic
controllers working a particular position
In attempting to determine the underlying hazards in a system, such surveys are usually independent of
routine inspections by government or company management. Surveys completed by operational personnel
can provide important diagnostic information about daily operations. They can provide an inexpensive
mechanism to obtain significant information regarding many aspects of the organization, including:
a) Perceptions and opinions of operational personnel;
b) Level of teamwork and cooperation among various employee groups;
c) Problem areas or bottlenecks in daily operations;
d) Corporate safety culture; and
e) Current areas of dissent or confusion.
43
Adapted from TC 13881 Safety Management Systems for Flight Operations and Aircraft Maintenance Organizations
10-7
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Safety surveys usually involve the use of checklists, questionnaires and informal confidential interviews.
Surveys, particularly those using interviews, may elicit information that cannot be obtained any other
way.
Typically, specific data, suitable for safety analysis, including assessment of safety performance, can be
acquired through well structured and managed surveys. However, the validity of all survey information
obtained may need to be verified before corrective action is taken. Like voluntary incident reporting
systems, surveys are subjective, reflecting individuals’ perceptions. As such they are subject to the same
kinds of limitations, such as the biases of the author, biases of the respondents, biases in interpreting the
data, etc.
The activities of safety surveys can span the complete risk management cycle from hazard identification,
through risk assessment into safety oversight and quality assurance. They are most likely to be conducted
by organizations that have truly made the transition from a reactive to a proactive safety culture.
Chapter 15 (Practical Considerations for Operating a Safety Management System) includes guidance to
Safety Managers on the conduct of safety surveys.
Safety monitoring
Traditionally, understanding of the unsafe conditions inherent in aviation developed largely upon the
evidence of accident and incident investigations. While an important element of aviation safety, this
reactive approach overlooks the experience of the thousands of flights each day that progress without
incident. Learning strictly through accident/incident investigations is like closing the barn door after the
horses have escaped.
Methods are evolving to learn more from the experience of typical flight operations. What strategies work
best for operational personnel in coping with the dynamic situations of flight operations, effectively
managing the normal threats, errors and undesirable states with the potential to erode accepted margins of
safety? By observing normal operations, normative data can be acquired and analyzed to identify hazards
that might otherwise not come to the attention of safety managers.
One method of monitoring normal operations on the flight deck is known as LOSA (Line Operations
Safety Audits) as described in Ch 16 (Aircraft Operations). Following the Threat Error Management
(TEM) model, analysis of safety data acquired through this methodology has provided several airlines
with important insight into the threats and errors that flight crew have to manage in their day-to-day
operations. The same principles are being extended for over-the-shoulder monitoring of air traffic control
operations; the concept for ATS is called NOSS (Normal Operations Safety Surveys (NOSS) and is
described further in Ch 17 (Air Traffic Services).
Quality assurance44
A quality assurance system (QAS) defines and establishes an organization’s quality policy and objectives.
It ensures that the organization has in place those elements necessary to improve efficiency and reduce
risks. If properly implemented, a QAS ensures that procedures are carried out consistently, that problems
are identified and resolved, and that the organization continuously reviews and improves its procedures,
44
Adapted from TP13881, Safety Management Systems for Flight Operations and Aircraft Maintenance Organizations,
Transport Canada, 2002
10-8
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
products and services. A QAS should proactively identify problems and improve procedures in order to
meet corporate objectives.
In a Safety Management system (SMS), these same functions are applied to understanding the human and
organizational issues that can impact on safety. A SMS employs similar methods to identify safety
problems within the organization and to reduce or eliminate the potential for related accidents through
improved procedures. Safety management focuses on the identification and control of safety risks.
The functions of a quality assurance programme help ensure that the requisite systemic measures have
been taken to meet the organization’s safety goals. However, quality assurance does not “assure safety”.
Rather, quality assurance measures help management ensure that the necessary systems are in place
within their organization to reduce the risk of accidents.
A quality assurance programme includes procedures for monitoring the performance of all aspects of an
organization including such elements as:
a) Well designed and documented procedures (e.g. standard operating procedures);
b) Inspection and testing methods;
c) Monitoring of equipment and operations;
d) Internal and external audits;
e) Monitoring of corrective actions taken; and
f) The use of appropriate statistical analysis, when required.
A number of internationally accepted quality assurance standards are in use today. The most appropriate
system depends on the size, complexity and product of the organization. ISO 9000 is one set of
international standards used by many companies to implement an in-house quality system. Such systems
will also ensure that the organization’s suppliers have appropriate quality assurance systems in place.
Safety audits
Safety auditing is a core safety management activity. Like financial audits, safety audits provide a means
for systematically assessing how well the organization is meeting its safety objectives. The safety audit
programme, together with other safety oversight activities, provides feedback to both managers of
individual units and senior management concerning the safety performance of the organization. This
feedback provides evidence of the level of safety performance being achieved and, where safety
deficiencies have been identified, enables appropriate corrective measures to be developed. In this sense,
safety auditing is a proactive safety management activity, providing a means of identifying potential
problems before they have an impact on safety.
A safety audit should verify compliance with standards and procedures, including compliance with the
organization’s safety management procedures. However, compliance with standards and procedures is not
necessarily sufficient to ensure safety.
Safety audits may be conducted internally by the organization, or by an external safety auditor.
Demonstrating safety performance for State regulatory authorities is the most common form of external
10-9
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
safety audit. Increasingly, however, other stakeholders may require an independent audit as a precondition
to providing a specific approval, such as for financing, insurance, partnerships with other airlines, entry
into foreign airspace, etc. Regardless of the driving force for the audit, the purpose, activities and products
from both internal and external audits are similar. While the procedures used in the conduct of these two
types of audits are similar, the scope is quite different.
Safety audits should be conducted on a regular and systematic basis in accordance with the organization’s
safety audit programme. Guidance on the conduct of safety audits is included in Chapter 14 (Safety
Auditing).
ICAO UNIVERSAL SAFETY OVERSIGHT AUDIT PROGRAMME (USOAP)
ICAO recognizes the need for States to exercise effective safety oversight of their aviation industries.
Thus, ICAO has established the Universal Safety Oversight Audit Programme (USOAP) 45. The primary
objectives of USOAP are:
a) To determine the degree of conformance by States in implementing ICAO Standards;
b) To observe and assess the States’ adherence to ICAO Recommended Practices, associated
procedures, guidance material and safety-related practices;
c) To determine the effectiveness of States’ implementation of safety oversight systems through the
establishment of appropriate legislation, regulations, safety authorities and inspections, and
auditing capabilities; and
d) To provide Contracting States with advice in order to improve their safety oversight capability.
A first USOAP audit cycle of most ICAO Contracting States addressing Annex 1 — Personnel Licensing,
Annex 6 — Operation of Aircraft and Annex 8 — Airworthiness of Aircraft has been completed.
Summary reports of the audits containing an abstract of the findings, recommendations and the proposed
State corrective actions are published and distributed by ICAO to enable other Contracting States to form
an opinion on the status of aviation safety in the audited State. Future USOAP audit cycles will use a
systemic approach, focusing on safety critical SARPs of all safety related annexes. The audit findings to
date have revealed many shortcomings in individual State’s compliance with ICAO SARPs.
REGULATORY SAFETY AUDITS
For some States, the ICAO USOAP audits are the only assessment made of the States’ aviation safety
performance. However, many States do carry out a programme of safety audits to ensure the integrity of
their national aviation system. Performance measurements and the appraisal of safety management
documentation, including the processes for hazard identification and risk assessment, should be part of the
regulatory safety audit.
Audits conducted by a safety regulatory authority would take a broad view of the safety management
procedures of the organization as a whole. The key issues in such an audit would be:
45
Guidance Material is available from ICAO to assist States in preparing for USOAP audits. See ICAO Docs: Safety Oversight
Audit Manual (Doc 9735) and Human Factors Guidelines for Safety Audits Manual (Doc 9806).
10-10
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
f) Surveillance and compliance. The regulatory authority needs to ensure that the required
international, national or local standards are complied with prior to issuing any licence or
approval and that the situation will apply for the duration of the licence or approval. The regulator
determines an acceptable means for demonstrating compliance. The organization being audited is
then required to provide documentary evidence that the regulatory requirements can and will be
met.
g) Areas and degree of risk. A regulatory audit should ensure that the organization’s safety
management system is based on sound principles and procedures. Organizational systems are in
place to periodically review procedures to ensure that all safety standards are being continuously
met. Assessments should be made of how risks are identified and how any necessary changes are
made. The audit should confirm that the individual parts of the organization are performing as an
integrated system. Therefore, regulatory safety audits must be conducted in sufficient depth and
scope to ensure that the organization has considered the various interrelationships in its
management of safety.
h) Competence. The organization has adequate staff that are trained, to ensure that the safety
management system functions as intended. In addition to confirming the continuing competency
of all staff, the regulatory authority needs to assess the capabilities of personnel in key positions.
The possession of a licence granting specific privileges does not necessarily measure the
competence of the holder to perform managerial tasks assigned by the organization; for example,
competence as a pilot may not equate to managerial acumen. Where there are short term skills
gaps, the organization will need to satisfy the Regulator that they have a viable plan to mitigate
the situation as soon as practicable. In addition, the Regulator should be interested in the
involvement of the highest level of management with responsibility for safety in the day-to-day
safety of the organization.
i)
Safety management. Major safety issues are managed effectively, and the organization is
generally meeting its safety performance targets.
SELF-AUDIT
Critical self-assessment (or self-audit) is a tool that management can employ to measure safety margins.
A comprehensive questionnaire to assist airline management to conduct a self-audit of those factors
affecting accident prevention is included at Appendix 2 to this chapter. This self-audit checklist is
designed for use by senior airline management to identify organizational events, policies, procedures or
practices that may be indicative of safety hazards.
There are no right or wrong answers applicable to all situations. Nor are all the questions relevant to many
types of operations. However, the thrust of a response to a line of questioning may be revealing of the
organization’s safety health.
Although this self-audit was originally designed for use in flight operations, the line of questioning is
relevant for the management of most operational aspects of civil aviation. Thus, this audit checklist can be
adapted for application in a variety of situations with the potential to contribute to an accident.
Increasingly, airlines are entering into “alliances” and code-sharing agreements. Under such
arrangements, the fare-paying public may have difficulty differentiating one airline from another; the
expectation is that an equivalent level of service will prevail, including an equivalent level of safety.
10-11
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Cooperative audits among the participants of these agreements ensure a consistent level of safety across
the alliance partners. The self-audit checklist may provide a common basis for such collective audits.
SAFETY PROGRAMME REVIEW
Any system requires feedback on the fulfillment of the system’s objectives in order to adjust the various
inputs and processes. A programme review validates the safety management system; it confirms that a
systemic approach is being taken to safety management (as opposed to a patchwork of uncoordinated and
unrelated safety initiatives). Through a regular review process, management can pursue continuous
improvement in the system.
When a safety management system is first implemented, typically there is enthusiasm, with new hopes
and aspirations. As the various elements of the safety management system come into effect, managing it
shifts from implementation to maintenance. The initial enthusiasm may wane. As safety performance data
builds, analysis may provide misleading conclusions unless a broad, system-wide perspective is taken.
For example, the number of reports to the hazard and incident reporting system may increase during
implementation, then reduce somewhat. This does not necessarily mean that the number of actual hazards
has been reduced. Perhaps the system is not operating as intended. Staff may have lost confidence in the
non-punitive nature of the organization’s safety culture, or in management’s readiness to address
identified safety deficiencies. It is the responsibility of the Chief Executive and the SM to ensure that this
does not happen.
————————
10-12
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Appendix 1 to Chapter 10
SAMPLE INDICATORS OF SAFETY HEALTH
POOR SAFETY HEALTH
IMPROVING SAFETY HEALTH
CAA
CAA
 Inadequate
governing legislation and
regulations;
 Potential conflicts of interest (such as
regulator also being service provider);
 Inadequate civil aviation infrastructure and
systems;
 Inadequate fulfillment of regulatory functions
(such as licensing, surveillance and
enforcement);
 Inadequate resources and organization for the
magnitude and complexity of regulatory
requirements;
 Instability and uncertainty within the CAA,
compromising quality and timeliness of
regulatory performance;
 Absence of formal safety processes such as
incident reporting and safety oversight;
 Stagnation in safety thinking (such as
reluctance to embrace proven best practices)
 National incident reporting programmes
(both mandatory and voluntary);
 National safety monitoring programmes
including incident investigations, accessible
safety databases, trend analysis, etc.;
 Regulatory oversight including routine
surveillance, regular safety audits and
monitoring of best industry practices;
 Risk based resource allocation for all
regulatory functions; and
 Safety Promotion programmes to assist
operators
Operational Organization
Operational Organization
 Inadequate organization and resources for
current operations;
 Instability and uncertainty due to recent
organizational change;
 Poor financial situation;
 Unresolved labour – management disputes;
 Record of regulatory non-compliance;
 Low operational experience levels for type of
equipment or operations;
 Fleet inadequacies such as age and mix;
 Poorly defined (or no) corporate flight safety
function;
 Inadequate training programmes;
 Corporate complacency re safety record,
current work practices, etc.;
 Poor safety culture
 Proactive corporate safety culture;
 Investment in human resources in such areas
as non-mandatory training;
 Formal safety processes for maintaining
safety
database,
incident
reporting,
investigation
of
incidents,
safety
communications, etc.;
 Operation of a comprehensive safety
management system (i.e. appropriate
corporate approach, organizational tools and
safety oversight);
 Strong internal two-way communications in
terms of openness, feedback, reporting
culture, dissemination of lessons learned;
 Safety education and awareness in terms of
data
exchange,
safety
promotion,
participation in safety fora, training aids.
————————
10-13
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Appendix 2 to Chapter 10
AIRLINE MANAGEMENT SELF-AUDIT46
OBJECTIVE
This self-audit may be used by airline management to identify administrative, operational and
maintenance processes and related training that might indicate safety hazards. The results can be used to
focus management attention on those issues possibly posing a risk of incidents or accidents.
MANAGEMENT AND ORGANIZATION
Management structure
46

Does the company have a formal written statement of corporate safety policies and objectives?

Are these adequately disseminated throughout the organization? Is there visible senior
management support for these safety policies?

Does the organization have a safety department or a designated Safety Manager (SM)?

Is this department or SM effective?

Does the departmental SM report directly to senior corporate management?

Does the organization support the periodic publication of a safety report or newsletter?

Does the organization distribute safety reports or newsletters from other sources?

Is there a formal system for regular communication of safety information between management
and employees?

Are there periodic safety meetings?

Does the company participate in industry safety activities, such as those sponsored by the Flight
Safety Foundation (FSF), the International Air Transport Association (IATA) and others?

Does the organization formally investigate incidents and accidents? Are the results of these
investigations disseminated to managers and operational personnel?

Does the organization have a confidential, non-punitive hazard and incident-reporting
programme?

Does the organization maintain an incident database?

Is the incident database routinely analysed to determine trends?
Adapted from Flight Safety Foundation: Flight Safety Digest, May 1999
10-14
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Does the company operate a Flight Data Analysis (FDA) programme?

Does the company operate a Line Operational Safety Audit (LOSA) programme?

Does the company conduct safety studies as a means of proactively identifying safety
deficiencies?

Does the organization use outside sources to conduct safety reviews or audits?

Does the organization solicit input from aircraft manufacturers’ product support groups?
Management and corporate stability

Does the organization use outside sources to conduct safety reviews or audits?

Does the organization solicit input from aircraft manufacturers’ product support groups?

Have there been significant or frequent changes in ownership or senior management within the
past three years?

Have there been significant or frequent changes in the leadership of operational divisions within
the organization in the past three years?

Have any managers of operational divisions resigned because of disputes about safety matters,
operating procedures or practices?
Financial stability of the organization

Has the organization recently experienced financial instability, a merger, an acquisition or other
major reorganization?

Was consideration given to safety matters during and following the period of instability, merger,
acquisition or reorganization?

Are safety-related technological advances implemented before they are directed by regulatory
requirement, i.e., is the organization proactive in using technology to meet safety objectives?
Management selection and training

Are there well-defined management selection criteria?

Is operational background and experience a requirement in the selection of management
personnel?

Are first-line operational managers selected from operationally qualified candidates?

Do new management personnel receive formal safety indoctrination and training?
10-15
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Is there a well-defined career path for operational managers?

Is there a formal process for the annual evaluation of managers?
Workforce

Have there been recent layoffs by the organization?

Are there a large number of personnel employed on a part-time or contractual basis?

Does the company have formal rules or policies to manage the use of contract personnel?

Is there open communication between management, the workforce and unions about safety
issues?

Is there a high rate of personnel turnover in operations or maintenance?

Is the overall experience level of operations and maintenance personnel low or declining?

Is the distribution of age or experience level within the organization considered in long-term
organizational planning?

Are the professional skills of candidates for operations and maintenance positions evaluated
formally during the selection process?

Are multicultural processes and issues considered during employee selection and training?

Is special attention given to safety issues during periods of labour-management disagreements or
disputes?

Have there been recent changes in wages or work rules?

Does the organization have a corporate employee health maintenance programme?

Does the organization have an employee assistance programme that includes treatment for drug
and alcohol abuse?
Fleet stability and standardization

Is there a company policy concerning cockpit standardization within the organization’s fleet?

Do pilots and flight operations personnel participate in fleet acquisition decisions?
Relationship with the regulatory authority

Are safety standards set primarily by the organization or by the appropriate regulatory authority?
10-16
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Does the organization set higher standards than those required by the regulatory authority?

Does the organization have a constructive, cooperative relationship with the regulatory authority?

Has the organization been subject to recent safety-enforcement action by the regulatory authority?

Does the organization consider the differing experience levels and licensing standards of other
States when reviewing applications for employment?

Does the regulatory authority routinely evaluate the organization’s compliance with required
safety standards?
Operations specifications

Does the organization have formal flight-operations control, e.g. dispatch or flight following?

Does the organization have special dispatch requirements for extended twin-engine operations
(ETOPS)?

Are fuel/route requirements determined by the regulatory authority?

If not, what criteria does the company use?

Does each flight crew member have a copy of the pertinent operations specifications?
OPERATIONS AND MAINTENANCE TRAINING
Training and checking standards

Does the organization have written standards for satisfactory performance?

Does the organization have a defined policy for dealing with unsatisfactory performance?

Does the organization maintain a database of training performance?

Is this database periodically reviewed for trends?

Are check pilots periodically trained and evaluated?

Does the organization have established criteria for instructor/check pilot qualification?

Does the organization provide specialized training for instructors/check pilots?

Are training and checking performed by formally organized, independent departments?

How effective is the coordination among flight operations, flight training and flight standards?
10-17
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT
Operations training

Does the company have appropriate training and checking syllabi?

Does this training include:

Line-oriented flight training (LOFT)?

Crew resource management (CRM)?

Human factors?

Wind shear?

Dangerous goods?

Security?

Adverse weather operations? e.g., anti-icing and de-icing procedures?

Altitude and terrain awareness?

Aircraft performance?

Rejected take-offs?

ETOPS?

Instrument Landing System (ILS) Category II and Category III approaches?

Emergency-procedures training including pilot/flight attendant interaction?

International navigation and operational procedures?

Standard International Civil Aviation Organization (ICAO) radiotelephone phraseology?

Volcanic-ash avoidance/encounters?

If a ground-proximity warning system (GPWS), traffic alert and collision avoidance system
(TCAS) and other special systems are installed, is specific training provided for their use? Are
there clearly established policies for their use?

Are English language skills evaluated during training and checking?

Is English language training provided?

At a minimum, are the procedures contained in the manufacturer’s aircraft operations manual
covered in the training programme?

Are there formal means for modification of training programmes as a result of incidents,
accidents, or other relevant operational performance?
Training devices

Are approved simulators available and used for all required training?

Is most of the organization’s training performed in the simulator?
10-18
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Do simulators include GPWS, TCAS, background communications and other advanced features?

Are simulators and/or training devices configurations controlled?

Has the organization established a simulator/training device quality assurance programme to
ensure that these devices are maintained to acceptable standards?

Does the regulatory authority formally evaluate and certify simulators?
Flight attendant training

Do flight attendants receive comprehensive initial and recurrent safety training?

Does this training include hands-on use of all required emergency and safety equipment?

Is the safety training of flight attendants conducted jointly with pilots?

Does this training establish policies and procedures for communications between cockpit and
cabin crew?

Are evacuation mock-up trainers that replicate emergency exits available for flight
attendant training? Do they include smoke simulation?
Maintenance procedures, policies and training

Does the regulatory agency require licensing of all maintenance personnel?

Is formal maintenance training provided by the organization for all maintenance personnel? Is
such training done on a recurrent basis? How is new equipment introduced?

Does the organization have a maintenance quality assurance programme?

If contract maintenance is used, is it included in the quality assurance programme?

Is hands-on training required for maintenance personnel?

Does the organization use a minimum equipment list (MEL)? Does the MEL meet or exceed the
master MEL?

Does the organization have a formal procedure covering communication between maintenance
and flight personnel?

Are “inoperative” placards used to indicate deferred-maintenance items? Is there clear guidance
provided for operations with deferred-maintenance items?

Are designated individuals responsible for monitoring fleet health?

Does the organization have an aging-aircraft maintenance programme?
10-19
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Is there open communication between the maintenance organization and other operational
organizations, such as dispatch? How effective is this communication?

Does the organization use a formal, scheduled maintenance programme?

Are policies established for flight and/or maintenance personnel to ground an aircraft for
maintenance?

Are flight crew members ever pressured to accept an aircraft that they believe should be
grounded?
Scheduling practices

Are there flight and duty time limits for pilots?

Are there flight and duty time limits for flight attendants?

Do the flight and duty time limits meet or exceed Regulatory requirements?

Does the organization train flight crewmembers to understand fatigue, circadian rhythms and
other factors that affect crew performance?

Does the organization allow napping in the cockpit?

Are on-board crew-rest facilities provided?

Are there minimum standards for the quality of layover rest facilities?

Does the organization have a system for tracking flight and duty time limits?

Has the organization established minimum crew-rest requirements?

Are augmented crews used for long-haul flights?

Are circadian rhythms considered in constructing flight schedules?

Are there duty time limits and rest requirements for maintenance personnel?
Crew qualifications

Does the organization have a system to record and monitor flight crew currency?

Does the record keeping system include initial qualification, proficiency checks and recurrent
training, special airport qualifications, line-check observations for:

Pilots in command?

First and second officers?
10-20
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Flight engineers?

Instructors and check pilots?

Flight attendants?

Does the regulatory authority provide qualified oversight of instructor and check-pilot
qualifications?

Are the simulator instructors line-qualified pilots?

Does the organization permit multiple-aircraft qualification for line pilots?

Do organizational check pilots have complete authority over line-pilot qualification, without
interference from management?

If the organization operates long-haul flights, is there an established policy for pilot currency,
including instrument approaches and landings?

Does the organization have specific requirements for crew pairing for crew scheduling?
PUBLICATIONS, MANUALS AND PROCEDURES

Are all flight crewmembers issued personal copies of their type operations manuals/FCOM and
any other controlled publications?

How are revisions distributed?

How are the issue and receipt of revisions recorded?

Does the organization have an airline operations manual?

Is the airline operations manual provided to each crew member?

Is the airline operations manual periodically updated?

Does the airline operations manual define:

Minimum number of flight crew members?

Pilot and dispatcher responsibilities?

Procedures for hand over of control of the aircraft?

Stabilized approach criteria?

Dangerous goods procedures?

Required crew briefings for selected operations, including cockpit and cabin crew
members?
10-21
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Specific pre-departure briefings for flights in areas of high terrain or obstacles?

Sterile-cockpit procedures?

Requirements for use of oxygen?

Access to cockpit by non-flight crew members?

Company communications?

Controlled flight into terrain (CFIT) avoidance procedures?

Procedures for operational emergencies, including medical emergencies and bomb
threats?

Aircraft anti-icing and de-icing procedures?

Procedures for handling hijacking and disruptive passengers?

Company policy specifying that there will be no negative consequences for go-arounds
and diversions when required operationally?

The scope of the captain’s authority?

A procedure for independent verification of key flight-planning and load information?

Weather minimums, maximum cross- and tail-wind components?

Special minimums for low-time captains?

Are all manuals and charts subject to a review and revision schedule?

Does the organization have a system for distributing time-critical information to the personnel
who need it?

Is there a company manual specifying emergency response procedures?

Does the organization conduct periodic emergency response drills?

Are airport facility inspections mandated by the company?

Do airport facility inspections include reviews of notices to airmen (NOTAMs)? Signage and
lighting? Runway condition, such as rubber accumulations, foreign object damage (FOD) etc.?
Aircraft rescue and fire-fighting (ARFF)? Navigational aids (NAVAIDS)? Fuel quality?
DISPATCH, FLIGHT FOLLOWING AND FLIGHT CONTROL

Does initial/recurrent dispatcher training meet or exceed Regulatory requirements?

Are operations during periods of reduced ARFF equipment availability covered in the flight
operations manual?

Do dispatcher/flight followers have duty-time limitations?
10-22
Part 2
Chapter 10 – Safety Performance Monitoring
DRAFT

Are computer-generated flight plans used?

Are ETOPS alternates specified?
____________________
10-23
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
Chapter 11
EMERGENCY RESPONSE PLANNING
Outline
Introduction
ICAO requirements
Plan Contents
 Governing policies
 Organization
 Notifications
 Initial response
 Additional assistance
 Crisis Management Centre (CMC)
 Records
 Accident site
 News media
 Formal investigations
 Family assistance
 Post critical incident stress counseling
 Post occurrence review
Aircraft Operator’s Responsibilities
Checklists
Training and Exercises
Involvement of the Safety Manager
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
This page intentionally left blank.
11-2
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
Chapter 11
EMERGENCY RESPONSE PLANNING
INTRODUCTION
Perhaps because aviation accidents are rare events, few organizations are prepared when one occurs.
Many organizations do not have effective plans in place to manage events during an emergency or crisis.
How an organization fares in the aftermath of an accident can depend on how well it handles the first few
hours and days following a major safety occurrence. An emergency response plan outlines in writing what
should be done after an accident occurs, and who is responsible for each action.
At first glance, the initial response to an emergency may appear to have little to do with safety
management. However, emergency response provides an opportunity to learn as well as to apply safety
lessons aimed at the reduction of damage or injury.
Successful response to an emergency begins with effective planning for a range of possible events. An
Emergency Response Plan (ERP) provides the basis for a systematic approach to managing the
organization’s affairs in the aftermath of a significant unplanned event – in the worst case, a major
accident.
The purpose of an emergency response plan is to ensure that there is:
a) Orderly and efficient transition from normal to emergency operations;
b) Delegation of emergency authority;
c) Assignment of emergency responsibilities;
d) Authorization by key personnel for actions contained in the plan;
e) Coordination of efforts to cope with the emergency; and
f) Safe continuation of aircraft operations or return to normal operations as soon as possible.
ICAO requirements
Any organization conducting or supporting flight operations should have an emergency response plan.
For example:
a) The ICAO Annex 14 — Aerodromes states that, before operations commence at an airport, an
emergency response plan should be in place to deal with an aircraft accident occurring on, or in
the vicinity of the airport.
b) The ICAO Manual entitled Preparation of an Operations Manual (Doc 9376) states that the
operations manual of a company should give instructions and guidance on the duties and
obligations of personnel following an accident. It should include guidance on the establishment
and operation of a central accident/emergency response centre – the focal point for crisis
11-3
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
management. In addition to guidance for accidents involving company aircraft, guidance should
also be provided for accidents involving aircraft for which it is the handling agent (for example,
through code-sharing agreements or contracted services). Larger companies may choose to
consolidate all this emergency planning information in a separate volume of their operations
manual.
c) The ICAO Airport Services Manual, Part 7 — Airport Emergency Planning (Doc 9137) gives
guidance to both airport authorities and aircraft operators on preplanning for emergencies, as well
as on coordination between the different airport agencies, including the operator.
To be effective, an ERP should be:
a) Be relevant and useful for the people who are likely to be on duty at the time of an accident;
b) Include checklists and quick reference contact details of relevant personnel;
c) Be regularly tested through exercises; and
d) Be updated when details change.
PLAN CONTENTS
An Emergency Response Plan (ERP) would normally be documented in the format of a manual. It should
set out the responsibilities and required roles and actions for the various agencies and personnel involved
in dealing with emergencies. An ERP should take account of such considerations as:
Governing policies. The ERP should provide direction for responding to emergencies, such as
governing laws and regulations for accident investigations, agreements with local authorities,
company policies and priorities, etc.
Organization. The ERP should outline management’s intentions with respect to the responding
organizations by:
a) Designating who will lead and who will be assigned to the responding teams;
b) Defining the roles and responsibilities for personnel assigned to the response teams;
c) Clarifying the reporting lines of authority;
d) Setting up of a Crisis Management Centre (CMC);
e) Establishing procedures for receiving a large number of requests for information, especially
during the first few days after a major accident;
f) Designating the corporate spokesperson for dealing with the media;
g) Defining what resources will be available, including financial authorities for immediate
activities;
11-4
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
h) Designating the company representative to any formal investigations undertaken by State
officials; and
i)
Defining a call-out plan for key personnel, etc.
An organization or flow chart could be used to show organizational functions and communication
relationships.
Notifications. The plan should specify who in the organization should be notified of an emergency,
who will make external notifications and by what means. The notification needs of the following
should be considered:
a) Management (chief pilot, chief maintenance engineer and senior management team, etc.);
b) State authorities (Search and Rescue, regulatory authority, accident investigation board, etc.);
c) Local emergency response services (airport authorities, fire fighters, police, ambulances,
medical agencies, etc);
d) Relatives of victims — a sensitive issue, in many States handled by the police with
information provided by the operator;
e) Company personnel;
f) Media; and
g) Legal, accounting, insurers, etc.
Initial response. Depending on the circumstances, an initial response team may be dispatched to the
accident site to augment local resources and oversee the operator’s interests. Factors to be considered
for such a team include:
a) Who would lead the emergency response team?
b) Who would be included on the initial response team?
c) Who would speak for the company at the accident site?
d) What would be required by way of special equipment, clothing, documentation,
transportation, accommodation, etc.?
Additional assistance. Company employees with appropriate training and experience can provide
useful support during the preparation, exercising and updating of a company’s ERP. Their expertise
may be useful in planning and executing such tasks as:
a) Acting as passengers in crash exercises;
b) Handling of survivors;
c) Assisting with interviewing passenger witnesses; and
11-5
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
d) Dealing with next of kin, etc.
Crisis Management Centre (CMC). A CMC should be established at the operator’s headquarters
once the activation criteria have been met. In addition, a command post (CP) may be established at or
near the accident site. The ERP should address how the following requirements for the CMC and CP
are to be met:
a) Staffing (perhaps for 24 hours a day 7 days per week during the initial response period);
b) Communications equipment (telephones, fax, Internet, etc.);
c) Documentation requirements, maintenance of emergency activities logs;
d) Impounding related company records;
e) Office furnishings and supplies;
f) Reference documents (such as emergency response checklists and procedures, company
manuals, airport emergency plans, telephone lists, etc.); and
g) Collection of latest information for daily updates, internal distribution, etc.
The services of a crisis centre may be contracted from another airline or other specialist organization
to look after the airline’s interests in a crisis away from home base. Company personnel would
normally supplement such a contracted centre as soon as possible.
Records. In addition to the organization’s need to maintain logs of events and activities post the
occurrence, the organization will also be required to provide information for any State investigation
team. The ERP should provide the following types of information to investigators:
a) All records relevant to the aircraft, the flight crew and the operation;
b) Lists of points of contact and any personnel associated with the occurrence;
c) Notes of any interviews (and statements) with anyone associated with the event; and
d) Any photographic or other evidence, etc.
Accident site. After a major accident, representatives from many jurisdictions have legitimate reasons
for accessing the site, for example, police, fire fighters, medics, airport authorities, coroners (medical
examining officers) to deal with fatalities, State accident investigators, relief agencies such as the Red
Cross and even the media. Although coordination of the activities of these stakeholders is the
responsibility of the State’s police and/or investigating authority, the aircraft operator should clarify
the following aspects of activity at the accident site:
a) Nominating a senior company representative at the accident site:
1) If at home base;
2) If away from home base;
11-6
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
3) If offshore or in a foreign State.
b) Management of surviving passengers;
c) Needs of relatives of victims;
d) Security of wreckage;
e) Handling of human remains and personal property of the deceased;
f) Preservation of evidence;
g) Provision of assistance (as required) to the investigation authorities; and
h) Removal and disposal of wreckage; etc.
News media. How the company responds to the media may affect how well the company recovers
from the event. Clear direction is required. For example:
a) What information is protected by statute (FDR data, CVR and ATC recordings, witness
statements etc.);
b) Who may speak on behalf of the parent organization at head office and at the accident site
(Public Relations Manager, Chief Executive Officer or other senior executive, manager,
owner);
c) Direction regarding a prepared statement for immediate response to media queries;
d) What information may be released (What should be avoided);
e) The timing and content of the company’s initial statement; and
f) Provisions for regular updates to the media.
Formal investigations. Guidance for company personnel dealing with State accident investigators
and police should be provided.
Family assistance. The EPR should also include guidance on the organization’s approach to assisting
the families of accident victims (crew and passengers). This guidance may include such things as:
a) State requirements for the provision of family assistance services;
b) Travel and accommodation arrangements to visit the accident location and survivors;
c) Programme coordinator and point(s) of contact for each family;
d) Provision of up-to-date information;
e) Grief counselling, etc.;
f) Immediate financial assistance to victims and their families; and
11-7
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
g) Memorial services, etc.
Some States define the types of assistance to be provided by an airline operating within and into the
country.
Post critical incident stress counselling. For personnel working in stressful situations, the ERP may
include guidance, specifying duty limits and providing for post incident stress counselling.
Post occurrence review. Direction should be provided to ensure that, following the emergency, key
personnel carry out a full debrief and record all significant lessons learned which may result in
amendments to the ERP and associated checklists.
AIRCRAFT OPERATOR’S RESPONSIBILITIES
The aircraft operator’s emergency response plan should be coordinated with the airport emergency plan
so that the operator’s personnel know which responsibilities the airport will assume and what response is
required by the operator. 47 As part of their emergency response planning, aircraft operators in conjunction
with the airport operator are expected to48:
a) Provide training to prepare personnel for emergencies;
b) Make arrangements to handle incoming telephone queries concerning the emergency;
c) Designate a suitable holding area for uninjured persons (“meeters and greeters”);
d) Provide a description of duties for company personnel (e.g. person in command, receptionists for
receiving passengers in holding areas);
e) Gather essential passenger information and coordinating fulfillment of their needs;
f) Develop arrangements with other operators and agencies for the provision of mutual support
during the emergency;
g) Prepare and maintain an emergency kit containing:
1) Necessary administrative supplies (forms, paper, name tags, computers, etc.); and
2) Critical telephone numbers (doctors, local hotels, linguists, caterers, airline transport
companies, etc.).
In the event of an aircraft accident on or near the airport, operators will be expected to take such actions
as:
a) Report to airport command post to coordinate the aircraft operator’s activities;
b) Assist in the location and recovery of any flight recorders;
47
48
See Chapter 18, Aerodrome Operations, for further discussion of airport emergency response planning.
See also Airport Services Manual, Part 7 — Airport Emergency Planning (Doc 9137) Op. Cit.
11-8
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
c) Assist investigators with the identification of aircraft components and ensure that hazardous
components are made safe;
d) Provide information regarding passengers and flight crew, and the existence of any dangerous
goods on board;
e) Transport uninjured persons to the designated holding area;
f) Make arrangements for any uninjured persons who may intend to continue their journey, or who
need accommodations or other assistance;
g) Release information to the media in coordination with the airport public information officer and
police; and
h) Remove the aircraft (and/or wreckage) upon the authorization of the investigation authority.
While this paragraph is oriented towards the airline operator, some of the concepts would also apply to
emergency planning by aerodrome operators and air traffic service providers.
CHECKLISTS
Everyone involved in the initial response to a major aircraft accident will be suffering from some degree
of shock. Therefore, the emergency response process lends itself to the use of checklists. These checklists
can form an integral part of the company’s Operations or Emergency Response Manuals. To be effective,
checklists must be regularly:
a) Reviewed and updated (for example, currency of callout lists, contact details, company
documentation for the CMC, etc.); and
b) Tested through realistic exercises.
TRAINING AND EXERCISES
An emergency response plan is a paper indication of intent. Hopefully, much of an ERP will never be
tested under actual conditions. Training is required to ensure that these intentions are backed by
operational capabilities. Since training has a short “shelf-life”, regular drills and exercises are advisable.
Some portions of the ERP, such as the callout and communications plan can be tested by “desktop”
exercises. Other aspects, such as “on-site” activities involving other agencies, need to be exercised at
regular intervals. Such exercises have the advantage of demonstrating deficiencies in the plan, which can
be rectified before an actual emergency.
INVOLVEMENT OF THE SAFETY MANAGER
When there is an accident, management will often turn to the Safety Manager (SM) for advice. The SM’s
broad knowledge of operations and maintenance, his skill set and network of contacts are an invaluable
resource to the organization in the event of any emergency. As such, the SM should play a major role in
11-9
Part 2
DRAFT
Chapter 11 - Emergency Response Planning
any emergency response. Following are some of the tasks that may be assigned to the SM in Emergency
Response Planning:
a) Advise senior management of the need for an ERP, on appropriate contents, and on training and
exercising of the plan;
b) Assist operational management in planning their portions of the ERP;
c) Liaise with the Airport Manager(s) to coordinate emergency response requirements;
d) Liaise with companies or organizations that may provide emergency response service;
e) Ensure any necessary training has been provided;
f) Ensure coordination of ERP with that of other airline partners; and
g) Ensure the plan is tested regularly and is updated as required.
__________________
11-10
Download