Failure Prevention and recovery

advertisement
Failure Prevention and recovery
Chapter -19
Summary
What is failure?
Why failures happen?
How do we measure failures?
Detection and analysis of failures.
How operations can improve their reliability?
How should the operations should recover from
the failures?
Failure
Failure
Failure
Failure
What is failure?
At its simplest ‘failure’ is when something does
not work as it should do. If the shop assistant
who sells you an item of clothing ‘fails’ to
inform you of the fact that it should be dry
cleaned, it is technically a failure. Yet usually in
operation management, we use the term
failure to denote a more dramatic event.
Usually we mean something stopping to do
what it should do. So a piece of material fails,
or a process fails.
Why do operations fail?
There are various reasons for the operations
failures:
I. Design failures
II. Facilities failures
III. Supplier failures
IV. Customer failures
V. Environmental failures
A. Design failures
A design may look fine on paper, but in real
circumstances the limitations will become clearer.
Design failures happen due to two different
situations:
Because miscalculating or overlooking a
characteristic of demand – process fail to adjust
with demand. For example a company process is
designed to manufacture 3 televisions per hour,
but the demand is to manufacture 7 televisions
per hour.
Unexpected circumstances – product size on the
design becomes different from demanded size.
Why systems fail
Failures inside the
operation
Supply failures
Design failures
Facilities failures
Staff failures
Customer
failures
B. Facilities failure
Failures with machines, equipments, buildings, and
fittings.
c. People failure
People failures come in two types:
Errors and Violations
Errors – are mistakes in judgments (run motorbike
on reserve petrol)
Violations – are doing the things contrarily to the
operating procedure. (driver avoiding changing
the engine oil, causing major problems to engine)
D. Supplier failure
Failure in the delivery or quality of goods and
services. (a music band of the hotel fails to turn –
in)
E. Customer failure
Misuse of products and services from the
production.
F. Environmental disruption-related failure
All the causes outside the opration. Example
hurricanes, floods, lightning, temperature, fire,
crime, theft, terrorism.
Measure failures
• Failures are usual happening as human failure.
For example :
1. A machine failure may happen due to the
poor design or maintenance .
2. A delivery failure by someone's errors to
manage supply schedule.
3. Customers mistake, because no one to
instruct the customer
So, failures can be controlled to an extent, again
an organization learn from failures. Thereby we
call failures as opportunities.
There are three main ways of measuring failure:
1. Failure rates – checking how often failure
occurs.
2. Reliability - checking the chances of an
occurrence of failure.
3. Availability – checking the amount of
available useful operating time.
FR (Failure Rate) measuring
The number of failures occurring over a period of
time. The failure of an airport security system
can be measured by measuring the failure of
security breaches.
FR= number of failures × 100
total numbers of products tested
Failure over-time – the ‘bath tub’
curve
• Failure is a function of time. Different stages
the probability for failing will be different.
The curve that describes failure probability is
called ‘bath-tub’ curve. According to this curve
the failure probability is high at beginning and
end of the life cycle
There are three distinct stages.
The ‘infant-mortality’ or ‘early-life’ stage where
early failures occurred by defective parts or
improper use.
The ‘normal-life’ stage when the failure rate is
usually low and constant.
The wear-out stage – when the failure rate
increases as it reaching the end of its working
life.
How failure is measured
Normal-life
stage
Wear-out
stage
Failure rate
‘Infant-mortality’
stage
Time
Reliability measuring
It measures the ability of a system, product or
service to perform as expected over time.
Rs = R1 ×R2 ×R3 ×Rn …..
Rs = reliability of system
Here we consider that a single failure in a
component of process causing failure to the
whole components.
So the more the components in a system, the
lesser will be the reliability.
MTBF (MEAN TIME BETWEEN
FAILURES)
MTBF = OPERAITNG HOURS
NUMBER OF FAILURES
III. Availability
The degree to which the operation is ready to
work. An operation is not available if it has
either failed or is being repaired followed by
a failure.
Failure prevention and recovery
There are three sets of activities which relate to
failure:
1. The first – understanding what failures are
occurring in the operation and why they are
occurring.
2.Second – examine or find the ways to reduce
chances for failure and minimize consequences of
failures.
3. Third – make plans and procedures to help the
organization from recovering when they occur.
The three tasks of failure prevention and recovery
Failure detection
and analysis
Finding out what is going
wrong and why
Improving system
reliability
Recovery
Stopping things going
wrong
Coping when things do go
wrong
Mechanisms to detect failure
There are six techniques to find out the failure:
In-process checks – employees check that the service is
acceptable during the process itself (restaurants )
Machine diagnostic checks – a machine is tested by putting it
through many activities. ( computer service)
Point of – departure – interviews - the staff may formally or
informally check that the services has been satisfactory.
Phone surveys – used to solicit opinions about products or
services.
Focus groups - group of customers are brought together to
discover problems or finding out attitude towards products
or services .
Complaint feed back cards and
questionnaires
Many organizations using them for collecting views about
products or services.
Failure Analysis
Understand why its has occurred.
1. Accident investigation – specifically trained staff analyze
the cases of accident.( airplane, road accident)
2. Failure traceability - making sure an operation can trace (
fing proof or evidence)
3. Complaint analysis – analyze the complaints.
CIT or critical incident analysis
Finding out the satisfying and non – satisfying
factors from customers.
How failure is detected and analyzed
Failure detection mechanisms include:
– in-process checks
– machine-diagnostic checks
– point-of-departure interview
Failure analysis procedures include:
– accident investigation
– failure mode-and-effect analysis
– fault-tree analysis
Failure mode and effect analysis
Identify the product or service or process that
are important in determining the effect of
failures. Or identifying failures before they
happen by providing checklist procedures.
It has three steps
What is likelihood that failure will occur?
What would the consequence of failures be?
How likely a failure to be detected before
affecting customers?.
Based on the above questions, we use the RPN or Risk
Priority number and find out the cause of failure.
There are seven steps involved in this
Page 629
Failure modes effects analysis
Normal
operation
Failure
Probability of
failure
Severity of
consequence
Degree of
severity
Risk priority number
Effect on
customer
Likelihood of
detection
Fault-tree analysis
It is a logical procedure starting from a failure or
potential failure and works back- wards to
indentifying all possible causes and origins.
Fault-tree analysis for below-temperature food
being served to customers
Food served to
customer is below
temperature
Food is
cold
Plate is
cold
Plate warmer
malfunction
Plate taken too
early from
warmer
Oven
malfunction
Key
AND node
Timing error
by chef
OR node
Cold plate
used
Ingredients
not defrosted
Improving process reliability
The responsibility of this step of operational
managers is to prevent failures, we can do it
by following 4 steps.
1. Design out fail points.
2. Build redundancy
3. Fail-safeing
4. Maintenance
a. Design out fail points
We can do it by proper product/service
designing, by quality planning and control, by
process controlling.
b. Redundancy
Building redundancy to an operation means,
having a back-up system. (airplane, kidney,
two red lights in cars)
c.fail-safeing
• Coming from Japanese methods of operations
improvement. It is known as Poka-yoke in
Japan, which means prevent. So the Poka-Yoke
are devices used against failures.
Poka-yoke (fail-safing)
3.5 inch diskette cannot be inserted unless it is
orientated correctly. This is as far as a disk can
be inserted upside-down. This feature, along
with the fact that the diskette is not square,
prohibits incorrect orientation. It is a control
method.
Warning lights and chimes alert the
driver of potential problems. These
devices employ a control method
and a warning method.
Poka-yoke (fail-safing)
Filing cabinets can fall over if too many drawers are
pulled out. For some filing cabinets, opening one
drawer locks all the rest, reducing the chance of the
filing cabinet tipping. It is a control method.
The window in the envelope is not only a labour
saving device. It prevents the contents of an envelope
intended for one person being inserted in an
envelope address to another. It is a control method.
Examples for Poke-yoke techniques
page 633
Maintenance
Maintenance is how organizations try to avoid
failure by taking care of their physical facilities.
Benefits of maintenance
1. it enhances safety
2.It enhances reliability
3. It enhances quality
4. Low operation cost
5. Longer life
6. Higher end value ( can be sued as second hand)
Three basic approaches for
maintenance
Run to breakdown ( RTB) - operate till
something fails and do maintenance.
Preventive Maintenance – eliminate or reduce
chances of failure by servicing the facilities.
Condition-based maintenance – perform
maintenance only when facilities required. It is
appliccable for expensive facilities.
Mix of maintenance approaches
A mixture of maintenance approaches is often used –
in a motor car, for example
Use preventive
maintenance
Use run-tobreakdown
maintenance
Use condition-based
monitoring
maintenance
Total productive maintenance
Means the productive maintenance carried out
by all employees through small group
activities. So TPM means maintenance
management.
Five goals of TPM
PAGE 538 Paragraph 2
Reliability-centered maintenance
• It is another method of maintenance where
different types of maintenance for different
parts of a process.
One part in one process can have several different failure modes,
each of which requires a different approach
Shredding process
Failures
Cutter ‘wear out’
failure pattern
Cutters
Time
One part in one process can have several different failure modes,
each of which requires a different approach
Shredding process
Cutters
Failures
Cutter ‘shake loose’
failure pattern
Time
Recovery
The activities designed to adjust with the
failures are known as recovery.
Failure planning
The procedures which allow the operation to
recover from failure is called failure planning
The stages in failure planning
Discover
Act
Learn
Plan
What’s
happened
Inform
Find root cause
Analyze failure
Contain
Engineer out
Plan recovery
What
consequences
Follow up
Procedures of business continuity
Avoid or recover from failures and keep business
going.
Page 643
Download