Powerpoint Set b

advertisement
Computers in Society
Lecture 7, part 2:
Software Reliability
Assignment for Next Time
Today we will talk about software reliability and
responsibility for software faults. I will focus on a
case study of the Therac-25, a radiation treatment
machine. The Therac-25 suffered from software
faults that caused it to give patients large overdoses
of radiation. Some patients died as a result of the
overdoses. The fault was difficult to find, so the
machine was used even after some patients had
been seriously harmed.
Assignment for Next Time (2)
For Wednesday’s class session, I want each group to
discuss the Therac-25 case, answer some questions,
and come to class prepared to share your answers.
You should address the following questions:
• Many people involved with the Therac-25 made
mistakes or made bad decisions. Who made
mistakes/bad decisions? How much responsibility
does each such person (or group of persons) have
for the injuries and death that were caused?
Assignment for Next Time (3)
• If you were the chairman of AECL, the company
that produced the Therac-25, what would you have
done after the software bug was understood? How
would you have compensated the victims? How
would you have changed your organization as a
result?
• How would you use your knowledge of the Therac25 case to organize software development to
minimize the chances that such a situation could
happen again?
Assignment for Next Time (4)
On Wednesday each group will have 5 to 10 minutes
to explain their answers to the rest of the class.
Please try to keep your answers and justifications
succinct so we have time for discussion.
The Therac-25
The Therac-25 was a linear accelerator device used
to treat cancer using radiation. It was built by
Atomic Energy of Canada Limited (AECL). It could
use either X-rays or electron radiation.
Therac-25 was a next-generation machine based on
two earlier machines that AECL had built in
cooperation with CGR, a French company. The
earlier machines were the Therac-6 and the Therac20.
The Therac-25 (2)
The Therac-6 and Therac-20 incorporated a
computer (PDP-11) as a front end. The computer
was a front end only. The linear accelerators could
be operated independently of the computer. All
safety features were built into hardware.
The Therac-25 integrated the computer with the
linear accelerator.
The Therac-25 (3)
Two important changes created the possibility for
problems: First, the Therac-25 reused code from
the earlier machines. Second, some hardware
safety features were replaced by software features.
In machines with both electron and X-ray modes
(dual-mode accelerators), a turntable rotates
needed equipment into position to give proper
doses of radiation. In older machines this was
checked with hardware interlocks. Therac-25
checked this in software.
The Therac-25 (4)
The Therac-25 went into service in 1983. Eleven
systems were delivered.
Problems began occurring in June 1985.
Therac-25 Problem History
Accident 1: Marietta, Georgia, June 1985.
Kennestone Regional Oncology Center (KROC)
•A patient was burned by treatment and suffered
crippling injuries. KROC contacted AECL and asked if
the Therac-25 could have failed to diffuse the
radiation beam. AECL said no.
•The patient sued AECL and the hospital in October
1985.
Therac-25 Problem History
Accident 2: Hamilton, Ontario, July 1985. Ontario
Cancer Foundation.
•A patient was burned during treatment. The
machine shut down during treatment. The display
indicated no treatment had been made. The
operator tried to proceed with treatment multiple
times until the machine suspended treatment. The
patient complained of being burned, and was
hospitalized for radiation overdose three days later.
•The patient died of cancer in November 1985.
Therac-25 Problem History
First AECL Investigation: July-September 1985
• AECL sent an engineer to investigate after the
Ontario overdose.
• The engineer discovered design problems related
to a microswitch.
• AECL introduced hardware and software changes
to fix the microswitch problem.
• A Canadian regulatory board requested a redesign
of the handling of malfunction conditions, but AECL
did not comply.
Therac-25 Problem History
Accident 3: Yakima, Washington, December 1985.
Yakima Valley Memorial Hospital.
• A patient developed a pattern of striped burns as a
result of treatment. The hospital staff suspected
that the pattern was from slots in the accelerator’s
blocking trays.
• AECL claimed that neither the Therac-25 or
operator error could have produced the damage.
AECL also claimed that no similar accidents had
been reported.
•The patient survived, though she was left with
scarring and a mild disability.
Therac-25 Problem History
Accident 4: Tyler, Texas, March 1986. East Texas
Cancer Center (ETCC).
• During a treatment session the operator noticed
she had entered an “X” (for X-ray) instead of an “E”
(for electron) into the display. She quickly fixed the
problem, moving the cursor to the field in error,
changing it, and moving the cursor back to the
bottom of the screen. The system was designed to
detect that input was complete when the cursor
was in the bottom right position.
Therac-25 Problem History
Accident 4 continued:
• Once the operator was ready, she started
treatment. After a few seconds the system shut
down and gave the error code “Malfunction 54”.
The operator continued treatment.
• The patient, who had had eight previous
treatments, knew something was wrong because of
pain he experienced.
•The patient died five months later of a radiation
overdose of between 80-100 times the prescribed
dose.
Therac-25 Problem History
Second AECL Investigation: March 1986.
• ETCC shut down its Therac-25 after the accident
and notified AECL.
• AECL sent two engineers, who were unable to
duplicate the problem. They claimed it was
impossible for the Therac-25 to overdose a patient.
They blamed the problems on the hospital’s
electrical system.
• ETCC found no problems with the electrical
system, and put the Therac-25 back into service.
Therac-25 Problem History
Accident 5: Tyler, Texas, April 1986. East Texas
Cancer Center (ETCC).
•This accident was virtually the same as accident 4.
The same operator was at the controls, and made
the same change. The same behavior and
“Malfunction 54” occurred. ETCC shut down the
machine and contacted AECL.
• The patient received a massive overdose of
radiation to his brain and died three weeks later.
• After this incident, investigators were able to
duplicate it, and the first major software bug was
detected.
Therac-25 Problem History
Therac-25 Declared Defective: May 2, 1986.
• On May 2, 1986, the US Food and Drug
Administration (FDA) declared the Therac-25 to be
defective.
• AECL was required to notify all Therac-25
customers.
• To gain back FDA approval, AECL had to show how
it would make the Therac-25 safe.
Therac-25 Problem History
Accident 6: Yakima, Washington, January 1987.
Yakima Valley Memorial Hospital.
• A second patient developed a pattern of striped
burns as a result of treatment.
• The hospital staff was able to match the burn
marks to the slots in the Therac-25’s blocking tray.
•The patient died three months later.
Therac-25 Problem History
Therac-25 Declared Defective: February 1987.
• On February 10, 1987 the US Food and Drug
Administration (FDA) declared the Therac-25 to be
defective. It recommended that all machines be
shut down.
• To gain back FDA approval, AECL had to show how
it would make the Therac-25 safe.
Therac-25 Problem History
Therac-25 Declared Defective: February 1987.
• It took five months and five plans to receive FDA
approval. The final plan included hardware
interlocks to prevent overdoses or activating the
beam when the turntable was not in the correct
position.
• No accidents have been reported since.
Therac-25 Software Bugs
One bug occurred because the system detected end
of data entry when the cursor moved to the bottom
right of the entry screen. At that point magnets for
directing the beam would be positioned, which took
a few seconds. After the magnets were positioned,
the cursor was checked again. If it was at the
bottom of the screen, no changes were detected.
Therac-25 Software Bugs (2)
If a fast-typing operator such as the one in Texas
made a change while the magnets were moving and
the restored the cursor to the bottom of the page,
the changes would show on the page. However, the
system would see the cursor at the bottom and not
check for the changes. The previous (mistaken) data
would be used.
This is an example of a race condition.
Therac-25 Software Bugs (3)
A second race condition produced the overdoses at
the Yakima center. It occurred when the machine
was moving the gun into position.
A variable was supposed to be zero if the beam was
in position to fire. Any other value meant that the
beam should not fire.
When the beam was not in position, the variable
would be incremented steadily.
Therac-25 Software Bugs (4)
The incrementing counter only held the values from
0-255. That meant that on occasion it reset to zero.
If the operator pressed the button to fire the beam
when the value reset from 255 to zero, the beam
would fire even if it was not in position. This was a
rare occurrence, but it could occur, and did on two
occasions.
Download