Taking Safe Decisions - Final drive failures Worked Example

advertisement
Taking Safe Decisions Worked Example
Final drive failures
• Decision making made using competent
professional judgement.
• Circumstances where a risk control may be
introduced for business reasons, such as
reputation, even though analysis shows that the
control is not required to meet legal obligations.
Summary:
This worked example describes how risk
assessment and monitoring can be used
together to support the iterative design
and implementation of risk reduction
improvements, considering both business
and safety factors. This includes the
replacement of existing controls where they
are assessed to be no longer required and
the implementation of new risk controls for
business reasons. The worked example also
illustrates some of the challenges of decision
making processes involving a variety of
stakeholders.
• Large number of parties can make the decision
making a lengthy process, therefore this should
to be factored into timescales.
• Lessons learned when making a change can be
transferrable to the wider business.
1. Origin of review
Historically, monitoring of the final drives on certain
types of rolling stock has indicated a range of issues
which can cause their input bearings to heat up
rapidly, leading to failure of the input shaft.
Key learning points:
This worked example illustrates:
• Decision making can be an iterative process with
risk controls implemented, updated and removed
in response to monitoring and other sources of
new information or changing circumstances.
Factors to consider
A Railway Undertaking (RU) had experienced
a series of final drive failure incidents, the most
notable of which was a final drive failure that led
to the detachment of the drive shaft. The RU was
concerned as, not only had there been damage
to the train and minor injuries to a customer on
the station platform caused by ballast thrown up
by the drive shaft when it detached from the train,
but the failure also presented a derailment risk.
Nature of the decision
Risk owner
Owned by one
organisation
Shared by many
organisations
Worst credible case
consequences
Insignificant
Multiple fatalities
Operational experience
Extensive
None
Technology
Mature
Novel
Complexity
Very simple
Highly complex
Ability to monitor and
act post change
Can identify problems
and resolve quickly
Difficult to monitor
and/or intervene
More likely to be catergorised as significant
Approach
for making
the
decision
More senior level decision taking
More consultation
More extensive and detailed analysis
More time to agree and implement the decision
Figure 2: Scoping Final drive failures
www.rssb.co.uk
1
The independent investigation into this final drive
failure focussed on early life issues and led the RU
to introduce several new risk controls, including
additional monitoring of new drives.
A subsequent failure however was found to be
caused by a different issue, that of the reliability of
the final drive oil pump. Therefore, additional checks
were also introduced for the pump operation.
A further failure, not related to early life or oil pump
malfunction, led the RU to conclude that there were
indeed a number of potential failure modes. This
resulted in the RU conducting a fleet-wide check
to identify the possible options for risk controls and
deciding to conduct a risk assessment and review of
the risk controls in place to try and prevent repeat of
this type of failure.
2. Analysing and selecting options
The RU proceeds with the risk assessment and
review of the risk controls. Although the decision of
which option to implement is conducted within the
RU, this is supported by wider industry consultation
at a national level, including the Office of Rail
Regulations (ORR), the rolling stock operating
companies (ROSCOs) and suppliers. There is
already considerable operational experience of
the technology involved in final drives, but the
understanding of their in-service behaviours
across the industry is initially found to be weak.
This understanding is quickly strengthened from
review of the incidents previously noted. The
conclusions from these investigations provide a
broader knowledge base for the RU and lead to
the introduction of further short term risk controls,
together with revision of existing controls, over a
period of months.
Option
Costs
Benefits
1
Do nothing and continue with
mitigations already introduced in
response to the failures.
Discounted - The mitigations introduced in response to the
failures previously noted include a checking regime, oil pump
priming check and oil sampling. These are not considered
sustainable as they are highly labour intensive and are expected
to become less effective as staff become less vigilant with the
checks over time. The business case CBA suggests that their
gross cost is substantially higher than their safety benefit.
2
Remove mitigations already
introduced in response to the
failures.
Discounted - in light of the failures, removal of risk controls
without providing alternatives is not considered to be a viable an
option.
3
Fit the modification and
continue with mitigations already
introduced in response to the
failures.
Discounted – it is assessed that there is no cost benefit case
for this option due to the intensive nature of the mitigations
already introduced in response to the failures.
4
Fit the modification and remove all Discounted – the business case cost benefit analysis indicates
mitigations already introduced in
that there is a case for fitting the modification but this option is
response to the failures.
not considered to give sufficient assurance around failures due
to pump priming.
5
Fit the modification and remove
the mitigations already introduced
in response to the failures, except
oil pump priming and the use of
oil sampling.
Selected – the business case cost benefit analysis indicates
that there is a case for fitting the modification for this option.
Oil pump priming and oil sampling are selected as possible
mitigation measures to retain as they are less labour intensive
than the other measures. The modification does not monitor
pump priming therefore this provides additional value, and oil
sampling provides a backup for the modification until it is shown
to be working as intended. The deciding factor in selecting this
option is the potentially severe reputational risk should another
failure occur and no further mitigations had been put in place.
Table 1:Summary of the business case options
2
www.rssb.co.uk
The preferred long term solution is initially to fit
alternative drives, but as the RU estimates that
this will take 12-18 months, the timescales are
considered prohibitive and this option is not pursued
further at this point. The fleet-wide check identifies
fitting a Temperature Intervention Modification
(referred to as ‘the modification’) as a possible
method for detecting failures from a range of
causes. The modification is not intended to resolve
design issues or issues of drive performance but
is configured to help identify the symptoms of
catastrophic failure of the drive and potential drive
shaft detachment (the events leading to the failure of
the final drives can occur rapidly). The modification
is designed to detect a rise in temperature and
alert the driver to bring the train to a stop. If the
warning is ignored, the modification is designed to
shut the engine down well before the temperature
reaches the range for likely catastrophic failure. The
RU therefore proceeds to assess the use of this
modification as an element of possible options for
reducing the risk from catastrophic final drive failure.
The possible combinations of risk controls that could
be applied as a longer term solution are developed
by a team of operational and technical experts into
five options to evaluate for the business case (Table
1) used to support the final overall design of the
risk controls. These focus on the issues of whether
or not to fit the modification and which other risk
controls should be retained. Safety considerations
are the starting point for assessing the business
case options, but commercial and business risks
also play a significant role in the decision making.
The business case cost-benefit analysis (CBA)
initially indicates that there is justification for fitting
the modification in either options 4 or option
5, based on the avoidance of costs of future
incidents. Also, the checking regime introduced in
response to previous failures is very labour intensive
which makes it more feasible to justify fitting the
modification instead. Given the number of incidents,
it is considered feasible that in the event of a future
incident there may be a call from shareholders to
remove the affected fleet, which accounts for half
of the RU’s rolling stock, to allay fears of further
failures. The RU determined that the loss of
revenue, compensation payable for cancellation of
services and the additional replacement bus costs
for just one day would pay for the modification, even
accounting for the savings on fuel and materials.
Also, the repair costs for a train failure with the drive
shaft detached were similar to the cost of installing
the modification. Therefore, the RU determined
that preventing another single incident would justify
www.rssb.co.uk
funding the modification as well.
Overall, potential reputational damage becomes the
deciding factor for the RU to fit the modification.
Simply, it is concluded that the risks of not fitting the
modification and a further incident then occurring,
are unacceptable:
• Another incident could have the small potential
to cause injury or death to customers, which
would bring severe reputational damage
and diversion of extensive resources to any
associated investigation, with the potential also
for prosecution.
• Speed restrictions could be placed on all affected
vehicles following another incident, impacting
performance, causing delays and leading to
the payment of penalties to the Infrastructure
Manager (IM).
• There could potentially be pressure from
stakeholders requesting that the affected vehicles
are withdrawn as they are seen to be unsafe
and, in the event of an incident, there could also
be pressure from ORR remedy the failure, which
may result in the fleet being to withdrawn from
service.
On the balance of the cost benefit analysis
presented in the business case, and consideration of
the potential reputational risks, it is decided that the
modification should be fitted and option 5 is finally
selected.
3. Making a change
The selected option is assessed using the
company’s internal risk assessment methodology
with a combination of quantitative and qualitative
risk estimation techniques to identify and classify
hazards, evaluate whether the risk is acceptable and
identify any further controls that should be applied
to reduce the risk. The RU also holds discussions
with the relevant ROSCOs and suppliers to identify
additional hazards and controls. Once the business
case is approved, the modification is implemented
fleet-wide.
The risk assessment process also identifies an
updated final drive overhaul regime as a risk control
to address the root cause of the failures, alongside
the fitment of the modification, which detects the
symptom of the problem. The new specification
for the overhaul regime, conducted roughly every 2
years, is subsequently reviewed and strengthened
in light of the information from investigation of
3
incidents.
A cross-industry stakeholder working group is
formed to contribute to these reviews, including
the ROSCOs and suppliers to make use of their
technical expertise and gain cooperation for making
the change at an early stage. Other RUs who are
using the same final drive are also included to draw
on a wider range of operational experience. Other
key players in the industry, such as the IM and ORR,
are kept informed of progress but are not an active
part of the group.
The RU involves participants from various different
groups to provide a range of expertise, and therefore
factors this into the planned timescales, as coordination between the groups makes the decisionmaking process lengthy. The RU includes time to
manage the supply chain as each ROSCO holds
different views regarding the long term strategy for
specification of the final drive, therefore the short
term changes do not fully align with the long term
strategies for all the involved parties. Engaging the
other RUs, who use the same type of final drive in
the process also takes time; as they have not had
similar incidents the review is not a high priority and,
despite participating in the process they eventually
decide not to fit the modification as it is primarily
a business decision for the RU, and not a legal
requirement.
4. Monitoring safety
Subsequent monitoring leads the RU to modify the
actions taken on the basis of the risk assessment.
The RU removes oil sampling from the mitigation
measures after sufficient monitoring shows that
the modification is working as intended. Whilst the
modification picks up failures rapidly, oil sampling
has a five day turnaround which is inconsistent
with the speed at which defects can escalate to a
catastrophic loss of integrity.
The modification picks up a potential incident
shortly after implementation, thus justifying its
cost. The modification also effectively monitors the
performance of the final drives and so is beneficial
for monitoring the effectiveness of the new final drive
overhaul regime. This indicates an improvement
in performance, and system reliability: leading up
to the incident involving the detached drive shaft
Selection of
Risk Acceptance
Principle
CODES OF
PRACTICE
SIMILAR REFERENCE
SYSTEM
EXPLICIT RISK
ESTIMATION
Application of Codes
of Practice
Similarity Analysis with
Reference System(s)
Identification of Scenarios
& associated Safety
Measures
Safety Criteria
Qualitative
Quantitative
Figure 3: Risk acceptance principles selected
4
www.rssb.co.uk
there were 7 failures over 10 months; following the
changes there are no incidents for over 18 months.
Monitoring further identifies occasional operational
disruption due to the failsafe nature of the
modification, as failures caused by damage to the
sensors bring trains to a stop. However, this is
judged tolerable by the RU given the safety and
reputational benefits of the modification.
The assessment of the final drive failures and
subsequent monitoring of the mitigation measures
post-implementation feeds into the RU’s wider risk
assessment programme. The lessons learned
from the final drive failures are extended to the
management of final drive condition across the RUs
entire fleet.
www.rssb.co.uk
5
Download