Uploaded by hernanjnc

cAPITULO 1

advertisement
CHAPTER 1
Introduction to functional safety
Abstract
To manage hazards in the process industries, the associated risk of undesired incidents needs to be
evaluated and managed. As an illustration, a fictional incident on a badly managed process plant
is narrated, along with proceedings at the subsequent board of enquiry (also fictional). Functional
safety, which is safety achieved by means of automatic systems, is one approach available to
manage risk. Functional safety standards applicable to a range of industry sectors are available, in
particular the process sector standard IEC 61511. Key functional safety concepts, in particular the
functional safety lifecycle, random and systematic failures, and competency management, are
introduced and explained.
Keywords: Competency; Functional safety; Functional safety lifecycle; Harm; Hazard; IEC
61511; Intrinsically safer design; Random failure; Risk; Systematic failure.
1.1 What could possibly go wrong?
All’s quiet in the control room. A routine Sunday evening, and the information screens
glow in bright colours; status information on all the tanks and pumps outside to the
operator’s left, and a half-finished Solitaire game to the right. A flow quantifier ticks over
silently on number 1 tank, registering an incoming transfer of flammable solvent via
pipeline from another site several kilometres away.
The operator flips through the file of open maintenance work orders on the desk. That
faulty level sensor again; the work order has been open for three months now. Maybe
they’ll get round to it eventually. The high level trip bypass warning light has been
glowing for so long that nobody even notices any more. Anyway, who cares? We’ve got
backup systems, the operator thinks. This place is safe as a rock.
An alarm sounds; a discreet baapebaap from a speaker on top of the console. Irritated by
the disturbance, the operator stretches out a lazy finger and stabs the well-worn
Acknowledge button. Just that low oil pressure alarm on number 4 cooling pump again, I
suppose. Checked it out twice before, false alarm every time. Anyway, half the alarms that
come up, I don’t even know what they mean. Forget it. Back to the Solitaire game.
Functional Safety from Scratch. https://doi.org/10.1016/B978-0-443-15230-6.00013-6
Copyright © 2023 Elsevier Inc. All rights reserved.
1
2 Chapter 1
Solvent flows into number 1 tank, just like it always does on a Sunday night transfer. The
level transmitter registered a fault an hour ago, so the operator switched over to a backup
transmitter. The level creeps up, 10 cm/min. The actual level’s already too high and the
operator should have shut it off by now, but the level is showing only 65% on the screen.
The backup transmitter has never been used before and it is miscalibrated, set up for a
denser solvent that used to occupy this tank. The operator, satisfied that the transfer still
has an hour to go, flips back to the gaudily coloured site overview screen.
The level creeps up to the high level alarm sensor. It hasn’t worked for years and nobody
can test it, because the wiring is on an inaccessible part of the top of the tank. Up, up goes
the level to the high level trip point. This time, the last-chance level sensor works but the
trip is bypassed; that same work order the operator just flipped through was closed a
month ago but nobody reset the bypass.
Solvent hits the tank’s overflow pipe and starts to pour out into the spill containment bund.
A flammable vapour detector picks it up and raises an alarm in the control room, but the
operator ignores it because of all the false alarms. Every time the wind blows, the vapour
alarm rings. They should fix that someday.
It’s a warm, still summer’s evening. Not a breath of wind. The solvent, gushing out of the
tank into the bund, evaporates to form a relentlessly expanding cloud of vapour, an invisible
ball of disaster waiting to strike. Spreading outwards, now 50 m, now a hundred metres from
the tank, it silently envelops the site and creeps over the fence to the neighbouring facility.
Next door’s nightwatchman is out on patrol. So many rats around here; what if they chew
the cables, he wonders. An unexpected chemical smell catches his attention. Glue? Paint?
Who could be painting at this time of night? He walks across to a storeroom at the back of
the warehouse, facing the tank storage site. The smell is really strong just here. Maybe the
rats knocked over some can of chemical waste? He pulls his flashlight from his belt and
flips it on. There is a spark .
1.2 Hazard and risk
1.2.1 What is a hazard?
The chairperson pulls her desktop microphone closer and flicks the switch. “Good
morning, everybody. This is day four of the board of enquiry into the explosion and fire at
ABC Solvents on 16th August last year. Today, we have an expert witness from the
National Safety Council, Mr Ben Kim. Welcome, Mr Kim.” The witness nods and settles
in his seat, looking round the room.
“Mr Kim, yesterday, we heard from another witness that the safety features on the tank
were hazardous because they were not in proper working order. Can you tell us, in your
view, what should have been done to keep them working properly?”
Introduction to functional safety 3
“Thank you, Madam Chairperson. May I offer a correction to your question? The term
hazard cannot correctly be applied to the safety features themselves. First, we should
understand what a hazard means: it is some physical aspect of our equipment which has
the potential to cause something we don’t want to happen. In this case, it is best to think
of the solvent, not the equipment, as the hazard. To explain what I mean: suppose the tank
were filled with water instead of solvent. Could the fire have happened? Of course not, as
the hazard arises from the nature of the solvent.
“The failure of the safety features is better defined as an initiating event, because it
initiates a chain of events made possible by the existence of the hazard.
“Actually, the international standard IEC 61511 standarddto which we will refer later in
this enquiryddefines a hazard very concisely as a “potential source of harm.” By harm we
mean, essentially, any consequence that we don’t want to happen.” Mr Kim pauses for a
sip of water.
1.2.2 What is harm?
“Mr Kim, thank you for the clarification. Can you give an example of what we mean by
harm? Is it the overflow of the tank?”
Mr Kim continues, “Harm is the final, undesired outcome at the end of a chain of events.
The various types of unwanted event we should consider can be grouped according to
what suffers the ill effectsdfor example, people, environment or profits. These are known
as risk receptors. If I may, I’d like to show some relevant examples of harm, classified by
risk receptor, on the screen.”
Mr Kim straightens his tie and continues. “When analysing the harm accruing to risk
receptors, an operating company will generally select a small subset of these types of
harmdusually not more than 5e6 items are relevant and significant for their specific
situation. The selected harm types need to be quantified (for example, in money terms) or
classified (in 3e5 severity categories).” (Readers of this book can find more detail in
Chapter 5.)
1.2.3 What is risk?
“Mr Kim, thank you, I’m much clearer on hazards and harm now. Another term we have
heard from previous witnesses is risk. Now, I understand that risk is the ‘combination of
the frequency of occurrence of harm and the severity of that harm’ (according to IEC
61511). Can you explain why risk is an important concept for our enquiry?”
“With pleasure, Madam Chairman. One of the major advances in safety management in
recent decades has been a shift in focus from hazard to risk. The concept of risk says that
4 Chapter 1
Table 1.1: Typical risk receptors and types of harm considered in the process industry.
Risk receptor
People
Types of harm that may be
considered in functional safety
analysis
Injury to personnel
Illness of personnel
Types of harm not typically considered
in functional safety analysis
Psychological harm such as stress, low
morale
Injury to visitors onsite
Injury to persons offsite
Illness of persons offsite (as a
direct result of a specific incident)
Surrounding
environment
(biological effects)
Surrounding
environment
(chemical and
physical effects)
Harm to significant populations of
wildlife, especially in the long
term, due to release of substances
(e.g. harmful gases, hot water
effluent)
Short term events with no long term
impact, e.g. emergency depressurization
venting of hydrocarbons
Illness of persons offsite (as an
indirect result of release of
harmful substances, e.g.
contamination of watercourses)
Long term impacts that are part of
normal operations and addressed in
other ways, e.g. CO2 emissions
Damage due to release of
chemicals (e.g. corrosion or
blackening of nearby structures
due to acid gases, soot)
One-time planned impacts such as
plant construction spoiling the view of
local residents
Physical damage (leading to
financial loss, injury or
complaints), whether onsite or
offsite, e.g. noise, earth tremors
from mining or fracking
Breach of permit conditions, e.g.
excessive flaring
Financial
Equipment damage (direct and
indirect, e.g. due to fire)
Costs associated with idle time (e.g.
personnel salaries, lease or depreciation
of equipment)
Loss of production capacity (may
be calculated in gross or net
terms)
Long term loss of business due to
inability to supply customers
Loss of materials (e.g. destruction
of product inventory, damage to
catalyst)
Generation of additional waste
Cost of rework
Consequential losses such as
demurrage of ships in port waiting
for loading or unloading
Introduction to functional safety 5
Table 1.1: Typical risk receptors and types of harm considered in the process
industry.dcont’d
Risk receptor
Legal
Types of harm that may be
considered in functional safety
analysis
Types of harm not typically considered
in functional safety analysis
Fines and compensation as a
result of an incident
Cost of defending legal cases
Jailing of senior staff
Reputation
Adverse publicity
Loss of shareholder value
Requirement for public
notification or evacuations
Loss of privilege to
operate
Withdrawal of operating licenses
Loss of public confidence or acceptance
Withdrawal of environmental
permits
we pay more attention to harmful outcomes that are more serious, more likely to occur, or
both. This means that we can focus our effortdand expendituredwhere it will give the
biggest safety return. However, it also means that we need to identify both the frequency
and severity of harm that can arise from an incident.” (Chapters 3 and 5 of this book cover
these points in detail.)
“That leads us to the question of deciding how safe a facility should be; or, to put it
another way, how much risk can be tolerated.”
The chairperson reaches for her microphone. “Indeed, Mr Kim, that is one of the points I
want to ask. How is the level of risk tolerance generally determined in the process
industry?”
1.2.4 What is tolerable risk?
Mr Kim nods. “Good question, Madam Chairperson. If risks exist in our facilitydas they
surely willdwe must determine whether they are tolerable. At first sight, the idea that any
kind of risk can be tolerated is counter-intuitive, and may even seem inhuman if the risk in
question could lead to fatalities. However, tolerance of risk is a reasonable and, in fact,
entirely necessary part of everyday life. We took a calculated risk, for example, by taking
some form of transport to come here today, determining subconsciously that the benefits of
getting here outweigh the risk of an accident.” (For a detailed discussion of the sociopolitical aspects of the tolerable risk concept, see Ref. [1], p. 29ff.)
6 Chapter 1
“So, by determining the amount of risk the facility is willing to tolerate, we are able to
make reasoned judgments on questions like these:
•
•
•
•
•
Do the benefits of operating the facility outweigh the risks?
How well controlled are the risks?
Can I justify the safety case of the facility to the government and the general public?
Am I using my safety resources optimally?
Do I need to add more risk control measures, and if so, how well must they perform?
“Deciding on a level of tolerable risk is a critical aspect of risk control strategy. Arguably,
this is more a question of politics than of engineering, as it touches on sensitive questions
like the relative tolerability of human fatalities and lost profits. Fortunately, it is rarely
necessary for an individual organization to go through a traumatic decision-making
process about tolerable risk. There is now widespread consensus on tolerable risk levels,
enshrined in local best practice and, in some places, mandated by law.”
1.2.5 Risk management through functional safety
Mr Kim gathers his thoughts and continues. “An operating company needs to perform
analysis to determine the current risk levels in their process, and then compare them with
the defined tolerable risk levels. If there is a significant gap between the actual and
tolerable risk, this may indicate that the risk is not adequately controlled.
“At this point, the operating company should consider a hierarchy of risk management
measures.” Mr Kim’s assistant displays a slide on the screen, showing the following series
of questions:
•
•
•
Have we explored feasible options for reducing the inherent risk, such as substituting
less hazardous materials, reducing inventories or improving the segregation of hazards
and risk receptors?
Are the risks already as low as we can reasonably make them? (This is the ALARP
concept, which is covered further in Chapter 3.)
Do we need further risk reduction measures? If so, should they be implemented
through:
• design upgrades, e.g. increase in design pressure;
• passive protective systems, e.g. relief valves;
• improved operating/maintenance procedures and training;
• alarms, with defined response from operational personnel;
• mitigation systems to reduce the severity of an incident, e.g. fire protection systems;
• active protection systems, which automatically detect a dangerous condition and act
to keep the plant in a safe state?
Introduction to functional safety 7
Mr Kim explains further. “Madam Chairperson, as you can see, a series of protective
measures are available. The last of these, active protection systems, belongs to the realm
of functional safety: that is, risk control measures implemented through an active Safety
Instrumented System or SIS. That’s what I’m here primarily to discuss with your panel
today.”
The chairperson writes the words Functional Safety on her jotter and circles them. “Mr
Kim, are there any generally accepted standards covering the management of such
systems?”
“Indeed there are. Sound management of functional safety is the objective of the
international standards IEC 61508 and IEC 61511, which, with your permission, I’ll
introduce to the panel now.”
1.3 Functional safety standards: IEC 61508 and IEC 61511
1.3.1 Purpose of the standards
As Mr Kim explained in the fictional board of enquiry above, functional safety is the task
of achieving risk reduction by means of an automatic system, which is designed to
respond automatically to prevent an incident or to maintain safe operation. It covers a
range of activities: risk analysis, safety system design, construction, commissioning,
testing, operation, maintenance, and modification. A management system is put in place to
ensure everything is done correctly throughout the project lifecycle.
Achieving functional safety is a complex task, requiring cooperation between numerous
parties: design and instrument engineers, equipment designers and vendors, safety
consultants, software specialists, and operations and maintenance personnel, to name a
few. International standards help to clarify expectations between the various parties, and
provide a level playing field throughout the industry and across national boundaries. For
this reason, the IEC released the first complete edition of IEC 61508, its framework
standard on functional safety, in 2000, with a significantly updated second edition issued
in 2010 [2].
IEC 61508 covers the entire spectrum of functional safety in general terms, with particular
emphasis on the development of hardware and software for functional safety applications.
The intent is that specific industry sectors will develop their own flavours of this standard,
couched in terms applicable to their sector, and focusing on the most relevant aspects of
8 Chapter 1
Table 1.2: Sector-specific functional safety standards.
Industry sector
Latest year of issue as
of 2021
IEC 61508
General
2010
Hardware and
software design, risk
analysis
IEC 61511
Process
2016
Risk analysis, SIS
design, SRS, FSAa
IEC 61513
Nuclear power
2011
I&C architecture using
hardwired and/or
computer-based
systems
IEC 62061
Machinery
2021
Design, integration
and validation of
safety-related control
systems
ISO 26262
Automotive
2018
Development cycle
IEC 62279
Rail
2015
Software for railway
control and protection
ISO 13849
Machinery
Part 1: 2015
Part 2: 2012
Design and validation.
All safety technologies,
not just E/E/PE
IEC 62304
Medical devices
2006 þ 2015
amendment
Software development
EN 50129
Rail
2018
Hardware and
software, design and
implementation
ISO 25119
Machinery for
agriculture and
forestry
2018e19 þ 2020
amendments
Safety lifecycle
Standard
a
Major focus
Refer to the abbreviations list at the beginning of this book.
functional safety. The resulting sector-specific standards are listed in Table 1.2. Some
countries have implemented their own national standards, which are essentially identical to
the IEC standards and can be treated as such. An example is ANSI/ISA-61511:2018, the
US implementation of IEC 61511:2016.
1.3.2 Scope of IEC 61511
The standard for the process industry sector covers electrical, electronic and
programmable electronic (often abbreviated to E/E/PE) safety equipment. Purely
mechanical and/or pneumatic systems are, strictly speaking, outside the scope of IEC
61511, but the principles in the standard are often useful in managing such systems. Also
out of scope are conventional process control systems (e.g. PCS, DCS, BPCS) unless they
Introduction to functional safety 9
are required to play a part in high-integrity risk control measures (which, usually, they
should not). In practice, the standard is normally applied to Safety Instrumented Systems
(SISs) implemented using:
•
•
a safety-rated PLC; or
safety relay logic.
IEC 61511 is intended to protect specific risk receptors: only “protection of personnel,
protection of the general public or protection of the environment” are explicitly within its
scope. However, it can bedand often isdapplied to other risk receptors, as listed in
Table 1.1.
1.3.3 Why comply with IEC 61511?
One of the most significant features of the IEC series of functional safety standards is that
they are mostly performance-based, rather than prescriptive. That means, they expect
entities to set their own safety targets, meet those targets, and demonstrate that the targets
are metdwithout specifying the way in which this is achieved. Older prescriptive
standards laid down rules constraining some quite specific aspects of design such as how
many redundant items of hardware were required, irrespective of the actual safety
performance achieved thereby. The advantages of the performance-based approach
translate directly into benefits for the end user:
•
•
•
•
Solutions to risk management problems can be tailored to suit specific situations.
This often results in better safety performance at less cost.
Analytical methods can be selected to provide the optimal balance between analysis
costs and design costs (this point is covered further in Chapter 5).
Local and best practice can evolve over time, taking advantage of experience gained in
real-world applications.
Compliance with IEC 61511 is not mandatory under law, but is widely regarded as
representing best practice. As such, stakeholders such as end users, insurers and holding
companies regard IEC 61511 compliance as evidence of “all reasonable measures” being
taken to protect health and safety and avoid losses [3].
1.4 IEC 61511 key concepts
1.4.1 The functional safety lifecycle
Developing and implementing a Safety Instrumented System (SIS) is a stepwise process.
First, we must identify the hazards within the scope of the project, and determine the risks
they generate. Next, risk reduction measures must be developed, and assessed to ensure
they are adequate. If the risk reduction measures require a SIS, we design the SIS and
10 Chapter 1
check the design meets the risk reduction needs. Then the SIS is installed and
commissioned. During its operational lifetime, it may need to be reassessed and modified
according to changing circumstances. Eventually, parts of the SIS will be decommissioned,
and we must make sure this does not compromise the safety performance of the remaining
systems.
Successful execution of each step requires completion of all previous steps. Thus, the
standard requires a plan, detailing the steps required, the actions to be performed in each
one, and how the sequence as a whole will be executed and managed.
The steps of the lifecycle are known as phases. For convenience, in this book we will
sometimes group phases together into 3 periods: the analysis period, the design period and
the operational period. Fig. 1.1 shows the periods of the lifecycle, and Fig. 1.2 shows the
lifecycle phases typically included in each period.
Earlier, we noted that a key aspect of the standard is to demonstrate that safety
performance targets are met. To do this, we must measure performance and compare with
the goals that were set. If targets are not achieved, we should return to earlier steps and
Figure 1.1
Main periods of the functional safety lifecycle.
Introduction to functional safety 11
Figure 1.2
Phases included in each main period of the functional safety lifecycle.
revise the work that was done. This means that looping back within the sequence of steps
is an intrinsic part of managing functional safety. For this reason, the steps are arranged in
a functional safety lifecycle. A recommended scheme for a safety lifecycle is set out in
IEC 61511 (and a slightly different version in its parent standard, IEC 61508). In keeping
with its performance-based philosophy, the standard does not compel us to use its
recommended lifecycle; we are free to substitute one of our own, as long as it achieves all
12 Chapter 1
the same objectives. However, in practice, the lifecycle model set out in the standards is
almost universally adopted, as it is clear, comprehensive and intuitive.
Another important reason for adopting a cyclic, rather than linear, approach to safety
design is that operational needs change over time. Processes may need to be altered for a
number of reasons, such as:
•
•
•
•
•
•
Changing process parameters as operational experience is gained (e.g. optimization of
yield or manpower utilization, maintenance problems, avoidance of unnecessary
tripping)
Obsolescence or deterioration of equipment
Adoption of new technology
Changing product profile to match customer demands
Changes in aspects of plant management, such as equipment utilization or manning
Changes in environmental protection requirements
Any process change should prompt a return to early phases of the safety lifecycle, so that
the impact on the demands and performance of the SIS can be assessed. We’ll come back
to this topic in greater detail in Chapter 11.
1.4.2 Intrinsically safer design
In a typical functional safety project, the hazard identification and risk analysis phases
start with a substantially frozen design already embodied in P&IDs and equipment data
sheets. However, this tends to squeeze out the opportunity to apply ‘intrinsically safer’
design principles: the concept that it is generally better to eliminatedor at least
reducedhazards in the design, rather than managing the risks generated by those hazards.
During early-stage hazard identification studies such as HAZID and HAZOP, the analysis
team should be given the chance to question whether hazards could be better managed by
design changes rather than relying on layers of protection. Examples of intrinsically safer
design principles include:
•
•
•
•
•
Replacing a hazardous material with a less hazardous one
Reducing the inventory of hazardous materials
Applying less hazardous operating conditions (e.g. lower temperatures and pressures)
Increasing the design pressure of piping and equipment, so that upset conditions are less
likely to lead to a loss of containment
Reducing the opportunity for human errors, e.g. eliminating hose changeovers between
items of equipment in a batch process
Introduction to functional safety 13
1.4.3 The safety requirements specification (SRS)
This crucial component of functional safety is a document (or set of documents) spelling
out exactly what the SIS must do. It lists the design intent of the SIS, every detail of its
design specification, and a slew of information needed during the operational phase, such
as maintenance requirements. The SRS is first draughted when the need for the SIS is
identified; this takes place immediately after risk analysis is completed. Then, after the full
design details of the SIS have been elaborated, the SRS is updated to contain all the
information necessary for complete execution of the lifecycle.
The purpose of having a centralised document of this type is to provide a single point of
reference for all parties responsible for each phase of the lifecycle. Since the people
involved are likely to be spread across many departments and organizations, it is critical to
have an unambiguous definition of the SIS’s function and operation. Indeed, some of them
will be performing their duties many years after the SIS is commissioned.
Another important function of the SRS is to provide a benchmark, against which the SIS
itself can be validated, and its performance assessed. This allows reviewers to confirm or
revise assumptions made during the safety analysis and SIS design periods.
Extensive coverage of the SRS is provided in Chapter 7.
1.4.4 Assuring that functional safety is achieved
A key aspect of the standard is that we must demonstrate successful control of risk. There
are two main aspects to this:
•
•
minimizing the scope for undetected human error, and
ensuring that each phase of the lifecycle has been completed competently.
The standard identifies four separate activities for assuring this has been achieved, as
outlined briefly in Table 1.3. This is one of the more challenging areas of functional
safety, and often causes confusion. Areas of misunderstanding typically include:
•
•
•
•
•
The differences between the various activities
What is involved in each activity
When they should be performed, and how often
Whether they can be delegated or outsourced to consultants
Whether the activities need to be undertaken by independent parties
We’ll cover these topics in detail in Chapter 10.
14 Chapter 1
Table 1.3: Activities for assuring that functional safety is achieved.
Activity
Brief description
Verification
The inputs and outputs required for each lifecycle phase should be
defined. Verification involves confirming that the required output
has been generated.
Validation
During the analysis and design periods of the lifecycle, a document
known as the safety requirements specification (SRS) is generated.
Validation confirms that the commissioned SISdincluding
hardware, software and operating and maintenance
proceduresdmeet the stipulations of the SRS.
Functional Safety Assessment
(FSA)
FSA is a wide-ranging assessment of how effectively the functional
safety lifecycle is followed. It can be executed at up to five stages of
the lifecycle, although it is compulsory at only one stagedbetween
commissioning and process startup.
Audit
A review of evidence to demonstrate compliance with site-specific
procedures relating to functional safety.
1.4.5 Random and systematic failures
The safety lifecycle approach recognises that there are two fundamentally different ways
in which the SIS can fail to perform its intended function. These are known as random
and systematic failures. Because this concept underpins every aspect of the safety
lifecycle, a clear understanding of failure types is crucial.
Random failures are hardware failures. Every item of equipment has a finite lifetime,
during which some component within the equipment may break due to natural wear-andtear processes caused by fatigue. This is true even if the equipment is installed correctly,
operated within specification, and maintained properly.
Random failures can never be eliminated entirely, but they can be handled mathematically.
Although it is impossible to predict when any individual item of equipment will fail, we
can know a great deal about typical failure behaviour, given data from a large enough
population of equipment in service. For example, we can determine the item’s useful
lifetime, and the probability that it will fail during a given period of time. This
information is essential during risk analysis, because it allows us to calculate the extent of
risk reduction that a particular design of SIS can be expected to providedhence, whether
it is sufficient to meet the tolerable risk target (as we discussed in the Section 3.3).
Introduction to functional safety 15
Systematic failures are device failures ultimately caused by human errors. The lifecycle
presents numerous possibilities for human errors to occur; a few examples are
•
•
•
•
•
•
Incorrect risk analysis (failing to identify hazards, underestimating risks)
Administrative errors (working from out-of-date versions of documents, incorrect
drafting of documents, miscommunication)
Incorrect design of SIS
Software bugs
Incorrect installation of SIS
Failure to maintain equipment, or errors during maintenance (such as failing to remove
overrides after completing the maintenance procedure)
While some of these are under the direct control of the process plant owner or design and
construction contractor, others are not. For example, a safety equipment manufacturer may
make a design error, which could lie hidden for many months or years until a particular
combination of circumstances brings it to light. When the error is finally revealed, severe
consequences could occur without warning; for example, an emergency trip may fail to
operate on demand, leading to a fire or explosion.
Unlike random failures, systematic failures cannot currently be mathematically modelled.
Since it is impossible to test every combination of circumstances and events that could
ever arise, we can never know for sure whether errors exist in our SIS, how many, or how
serious they are. Statistical treatment is of little value, since error rate data collected in one
environment is unlikely to be applicable to another. The only practical way to address
systematic failures is to minimise them. The two main ways of doing this are:
•
•
Reduce the number of errors made in the first placedfor example, by ensuring individuals are competent, providing clear requirements and procedures, and reducing the
number of opportunities for error (fewer and simpler operations); and
Provide opportunities to detect errorsdfor example, by verification and review, and by
recording and investigating every unexpected incident involving the SIS.
For this reason, IEC 61511 places great emphasis on software development techniques,
management procedures, cross-checking of work completed (as discussed in Section 7.3)
and competency of individual safety practitioners.
Practical ways of addressing systematic failures are listed in Tables 1.4 and 1.5, while
Table 1.6 and Fig. 1.3 suggest ways to distinguish between random and systematic
failures.
16 Chapter 1
Table 1.4: Practical methods for reducing errors that can cause systematic failures.
Type of method
Ensure
competency
Practical steps involved
Chapter in
this book
Define the competency level required for each lifecycle task,
including qualifications, experience and knowledge
1, 6
Assign individuals to tasks for which they are competent
Encourage individuals to query any information or instructions
they do not understand or agree with
Information
availability
Ensure resources are available, e.g. access to up-to-date versions
of standards and codes of practice
6
Provide and implement a document control system, to ensure
everyone works from the latest version of each document. (This
is often part of an ISO 9000 quality management system.)
Use the SRS and other key lifecycle documents as the sole
means of transferring information between individuals
Use adequate labelling (of equipment and wiring) and
commenting (of software code)
Ensure procedures and manuals are available and fit for
purpose: clear, unambiguous, complete, and provided in the
local language.
Simplification
Do not use equipment with more features than actually
required
9
Make unneeded features (especially software features)
unavailable
Use passwords and other means of access control to limit the
number of individuals that can change things (such as
documents, wiring and software settings)
Use restrictive languages for the application program
Avoid unnecessary diversity. Use the same brand or type of
equipment and software for all similar applications where
practical.a
Familiarity
Avoid unnecessary novelty. Use well-established and familiar
equipment, procedures and methods
9
Suitability
Use equipment and software only for its intended function. Pay
attention to any restrictions listed in the equipment’s Safety
Manual.
8
Use SIL-certified equipment and validated tools (software
development tools, analytical software, test equipment).
Alternatively, use equipment with a good, documented track
record of prior use (see Chapter 9 for detailed coverage)
a
However, this can conflict with avoidance of common cause failures. See Chapter 8 for further discussion.
Introduction to functional safety 17
Table 1.5: Practical methods for detecting errors that can cause systematic failures.
Type of method
Practical steps involved
Chapter in this
book
Follow a properly designated review procedure, especially
for software development. Ensure an adequate degree of
independence between the executing engineer and the
reviewer. Record deviations and errors found, not for
disciplinary purposes but to allow an assessment of
whether systematic failures are properly under control.
10
Compare the expected and actual performance of the
SIS, especially in terms of trip rate (real trips and
spurious trips). If the actual trip rate is much higher than
expected (based on random failure rate calculations), it
indicates the presence of systematic failures in the design
and/or implementation of the SIS.
11
Investigation
Record and investigate all incidents of unexpected SIS
behaviourdespecially unwanted (spurious) trips,
diagnostic alarms, test failures, issues found during
maintenance, and events when the SIS is found to be in
an abnormal state (e.g. unauthorised bypasses,
parameters changed). Most of these will indicate the
presence of some kind of systematic failure.
11
Maintenance
When maintaining the SIS, always inspect and test it
before carrying out any maintenance works such as
cleaning and repair. Record the ‘as-found’ condition of
the SIS, since this more accurately represents the ‘real’
state of the SIS during the majority of its working
lifetime. Investigate the root cause of issues such as loose
connections, corrosion or other physical damage,
unauthorised or unexpected alterations from design
(compare back with the SRS), and any other finding that
could compromise the functioning of the SIS.
11
Review
Table 1.6: Guidelines for classifying failures as random or systematic.
Random failures
Unconnected to any specific causal event
Occurs within the design envelope of the SIS
Not attributable to a specific design or operating error
Systematic failures
May be associated with a design error
May be associated with exceeding the design envelope of the SIS
Attributable to a specific root cause
May be avoidable by a design change
May be controlled by improved training and procedures
18 Chapter 1
Figure 1.3
Decision flow diagram: classifying a failure as random or systematic.
Introduction to functional safety 19
Figure 1.3
Cont’d.
20 Chapter 1
Why is this type of failure known as systematic? The term arises from the idea that the
underlying error will systematically lead to a failure when a given set of conditions arise,
step by step, with essentially 100% probability. For example, if there is a division-by-zero
error in a line of computer code that runs as part of a housekeeping procedure once a
month, the program will crash on a specific date. Unfortunately, the term systematic is
prone to confusion, because it can also refer directly to a failure in a system (e.g.
management system). The terms deterministic, causative or induced would be preferable.
1.4.6 Competency
Competency is a core concept of IEC 61511. It requires us to
•
•
determine the level of competency required to perform each safety lifecycle task, and
assign individuals only to tasks for which they meet the competency requirements.
The competency requirements should be defined in terms of qualifications, general
experience, directly relevant experience, background knowledge (e.g. of functional safety
concepts and relevant regulations and codes of practice), and specific knowledge (of the
process, equipment and procedures concerned). All this should be documented, to provide
an audit trail for verifying systematic failure controls are effective. Fig. 1.4 shows the core
aspects of competency required by IEC 61511.
The standard requires only that each person is competent for the tasks they are
performing. It is not necessary for every engineer in the project to have a full in-depth
knowledge of every aspect of the SIS. The standard does not make any specific stipulation
about what competence actually means in practice: that is up to each individual
organization to decide.
One of the reasons for placing so much emphasis on assuring competency is that a great
many serious incidents in the past have been traced back to competency failures. One
chilling example relates to the collapse of a coal mine spoil heap at Aberfan, Wales, in
1966. According to Trevor Kletz, “responsibility for the siting, management, and
inspection of tips was given to mechanical rather than civil engineers. The mechanical
engineers were unaware that tips on sloping ground above streams can slide and have
often done so.” [4] The official report of the board of inquiry described the Aberfan
Disaster as
Introduction to functional safety 21
Figure 1.4
IEC 61511 competency requirements.
a terrifying tale of bungling ineptitude by many men charged with tasks for which they
were totally unfitted, of failure to heed clear warnings, and of total lack of direction from
above. Not villains but decent men, led astray by foolishness or by ignorance or by both
in combination, are responsible for what happened at Aberfan [4].
The result was 144 fatalities, 116 of whom were children in a nearby junior school.
Another reason for requiring evidence of competency is that many lifecycle activities are
heavily outsourced. An end user will typically delegate a bundle of safety engineering
activities to an EPC (engineering, procurement and construction) contractor. The EPC will,
in turn, purchase SIS components from manufacturers, and hire consultants to help with
safety analysis and verification activities. In each case, responsibility for ensuring
competency is effectively being transferred from one entity to another, further and further
from the final end user. Unless there are clear definitions of what constitutes competency
or how it is controlled, the end userdwho is ultimately responsible for safetydhas no
way of assuring the effectiveness of the safety products and services provided.
22 Chapter 1
Chapter 7 explains how competency management can be achieved in practice.
1.5 The structure of IEC 61511
The IEC 61511 standard itself does not make easy reading, especially if English is not
your mother tongue. It is, thankfully, more digestible than its parent standard, IEC 61508,
whose readability suffers from having to be comprehensive and cover every kind of
industry and situation. It is probably unnecessary for the individual safety engineer to read
the standard from cover to cover; however, each user should at least understand the Safety
Lifecycle, the documentation and verification requirements, and the aspects of the standard
applicable to one’s own responsibilities.
It may be helpful, then, for us to take a quick tour of the standard here. First, the standard
is in three parts. Part 1 is the core of the standard and addresses the whole lifecycle,
explaining the purpose and requirements for each phase. The phases covered in the
greatest detail are SIS design and software development. It also contains a brief discussion
of management and documentation issues. Importantly, it includes a substantial glossary of
abbreviations and definitions. Unlike the similar glossary in Part 4 of IEC 61508, it has the
advantage of being arranged, for the most part, in alphabetical order.
Part 2 is a series of Annexes. Annex A contains clause-by-clause guidance on many of
the clauses in Part 1, although the guidance is of limited practical value for most users.
Annex F is a worked example of the entire functional safety lifecycle. The remaining
annexes cover special topics, mainly around application program development.
Part 3 focuses mainly on risk analysis methods. It can be treated as a textbook of
background knowledge required for the risk analysis period of the safety lifecycle.
For a first-time reader, it would be most helpful to focus on Part 1, clauses 1 to 7 and 19;
the clauses and annexes of Parts 1 and 2 most relevant to your own role; and, if you are
involved in risk analysis, the relevant clauses and annexes of Part 3.
1.6 The origins of IEC 61511
One important characteristic of IEC 61511 and its parent standard, IEC 61508, is that it
places a strong emphasis on developing reliable software. The need for a focus on
software reliability became apparent during the 1980s, as increasingly sophisticated
Introduction to functional safety 23
control hardware became available. While it was easy to write elaborate software to
provide safety functions, it proved extremely difficult to prove the software was reliable.
The difficulty lay in two separate aspects: getting the specification right, and writing
applications that met the specification under all possible conditions.
At the same time as these software difficulties were becoming obvious, hardware was
rapidly advancing in complexity, to such an extent that it became impossible to
demonstrate hardware integrity by testing alone.
Without being able to demonstrate safety in both hardware and software of instrumented
safety systems, end users could not have confidence that major hazards were adequately
controlled. This problem was further compounded by the ever-growing trend towards
automated plants managed remotely by a small number of operators in a control room.
Since the operational staff were increasingly dependent on self-contained trip systems to
manage major upset conditions, the importance of confirming the dependability of those
systems was clear.
The response of the International Electrotechnical Commission (IEC), an independent
body based in Geneva with member committees representing the interests of 89
countries plus 84 affiliate members, was to set up separate groups to study the issue for
hardware and software. The aim was that each group would develop a standard to assist
developers and end users in claiming safety capability in their respective applications.
The studies were merged in the early 1990s, giving birth eventually to an umbrella
standard IEC 61508 that covered both hardware and software integrity in detail. The
merging of the two aspects of functional safety in a single standard was a recognition
that many of the issues are the same: overall safety management, competency, the
lifecycle approach, and configuration management are just a few of the aspects pertinent
to both. The major differences between hardware and software lie in the methods used
to achieve and demonstrate integrity; this is reflected in the two separate parts (Part 2
and Part 3, respectively) that IEC 61508 dedicates to them.
IEC 61511 was then developed as a specialization of IEC 61508 for the process industry,
as we described earlier in the Section 1.3.1.
24 Chapter 1
Exercises
1. Consider the fictional incident in the Section 1.1. Select two of the equipment failures
described. Are they likely to be random or systematic failures?
2. Traditionally, a cable trailed across the floor in an office environment has been
regarded as a “hazard.” How does this fit with the concepts of “hazard” discussed in
this chapter?
3. Look up the definition of one of this chapter’s safety management concepts (such as
hazard, risk, harm and risk receptor) in Wikipedia. Given that Wikipedia aims at a
broad, non-specialist readership, how does its discussion compare with the one here?
What does this say about society’s attitude to safety?
4. Classify the following failures as random or systematic, according to the discussion in
the Section 1.4.5. For each failure, describe how it should best be addressed (to minimize the chance of it causing harm).
(a) A shutdown valve sticks open when required to close. The valve is suitably designed
for its operating environment and process fluid, and is within its usable life.
(b) Same case as (a) except the valve missed its last proof test.
(c) Same case as (a) except the valve has exceeded its useful life.
(d) A pressure transmitter has worked loose on its mountings. As a result, it is
vibrating severely. It fails due to a crack in the PCB (printed circuit board).
(e) A software bug causes a safety function in a SIS to receive an incorrect “override”
signal. As a result, the safety function does not trip when required.
(f) A manual “override” key switch is faulty and overrides a safety function incorrectly. As a result, the safety function does not trip when required.
(g) A forklift truck strikes an instrument air line. As a result, air supply to a shutdown
valve is lost, and the valve closes spuriously.
Answers
Question 1dAnswer
The failure of the primary level measurement in the solvent storage tank, and the false
vapour alarms triggered by the wind, could be random failures. All the other failures
mentioned are systematic, as they are associated with specific errors in design, operations
or maintenance.
The high level trip in the solvent tank is not really a failure, as it is not faulty, but
bypassed. However, the root cause (poor management of bypasses) could be addressed by
the same type of solution as systematic failures, so it could usefully be categorized as a
systematic failure.
Introduction to functional safety 25
Question 2dAnswer
Categorizing a trailing cable as a hazard is a simple concept, but it rather unhelpfully
focuses attention on the cable itselfdleading us towards imperfect risk management
solutions such as “tape the cable to the floor” or “put the cable under the carpet”. If we
trace the causal chain back to the “equipment with potential to cause harm”, it helps us
address our attention to the copying machine attached to the cable. Relocating the machine
could be a better solution, and might also draw our attention to other related issues such as
noise, dust and ozone emanating from the machine. In other words, treating the machine
as the hazard may yield a more effective analysis of the risk. (This can also help turn our
attention towards intrinsically safer risk management solutions.)
Another possible approach is to look for the ‘reservoir of energy’ with the potential to
cause harm. The injury resulting from tripping over the cable derives from the potential
energy of the person’s body. Again, this helps us change our focus to the real issue: the
problem is not the cable, but the person having to cross the cable. Can we find a means to
separate people from cables? If so, we have removed one route by which the stored energy
can cause harm.
Question 4dAnswer
Not all practitioners agree on the boundaries between random and systematic failures, so
your answers may differ from those suggested here.
(a) Random
(b) Random. The fault is not related to whether the valve was tested, provided the valve is
still within its useful life.
(c) Systematic. The valve should not be used beyond its useful life. It is very likely to fail
as a result of using it beyond its useful life.
(d) As the transmitter is probably not designed for a high vibration environment, this
would likely count as a systematic failure.
(e) Systematic. All failures arising from software bugs are systematic failures.
(f) Random.
(g) Systematic. However, this does not count as a dangerous failure, because it causes a
spurious trip. So there is little purpose in classifying it as a random or systematic
failure. (See Chapter 2 for discussion of the term “dangerous failure.”)
References
[1] E. Marszal, E. Scharpf, Safety Integrity Level Selection, ISA, Research Triangle Park, 2002.
26 Chapter 1
[2] Anon, An Introduction to Functional Safety and IEC 61508. Application Note AN9025, MTL Instruments
Group, 2002. http://www.mtl-inst.com/images/uploads/datasheets/App_Notes/AN9025.pdf (retrieved on 8
January 2022). An admirably clear and readable starting point for readers who are unfamiliar with the
area of functional safety.
[3] P. Clarke, Setting the Standard, Control Engineering Asia, May 2011, pp. 12e18. Contact the author for a
copy via, www.xsericon.world. Focuses on the benefits of compliance for SIS designers and end users.
[4] E. Davies, Report of the Tribunal Appointed to Inquire into the Disaster at Aberfan on October 21st 1966,
HL 316, HC 553, HMSO, 1967 (retrieved on 6 January 2022), https://www.dmm.org.uk/ukreport/553-04.
htm.
Further reading
The following resources provide wide-ranging coverage of the functional safety lifecycle.
[1] Center for Chemical Process Safety (CCPS), Guidelines for Safe and Reliable Instrumented Protective
Systems, Wiley, New York, 2011 (Chapter 1 provides an especially lucid introduction to the role of
functional safety in risk management).
[2] P. Gruhn, S. Lucchini, Safety Instrumented Systems: A Life-Cycle Approach, ISA, Research Triangle Park,
2019 (Detailed and extensive coverage of SIS design and implementation, for process applications.
Especially detailed on hardware and software design and SIS validation).
[3] K. Kirkcaldy, Exercises in Process Safety, Available from: Amazon, Self-published, Milton Keynes, 2016.
[4] K. Kirkcaldy, D. Chauhan, Functional Safety in the Process Industry: A Handbook of Practical Guidance
in the Application of IEC 61511 and ANSI/ISA-84, Available from: Amazon, Self-published, Milton
Keynes, 2012 (See especially chapters 13e19).
[5] SINTEF, Guidelines for the Application of IEC 61508 and IEC 61511 in the Petroleum Activities on the
Continental Shelf (Guideline GL 070), Offshore Norge, Stavanger, 2018 (Concise coverage of many
aspects of functional safety for the oil & gas industry).
[6] D. Smith, K. Simpson, Safety Critical Systems HandbookdA Straightforward Guide to Functional Safety,
IEC 61508 (2010 Edition) and Related Standards, Including Process IEC 61511, Machinery IEC 62061
and ISO 13849, third ed., Butterworth-Heinemann, Oxford, 2011 (Chapter 4 on software is particularly
useful).
Download