The software development of the Therac

advertisement
Traci M. Glass
CSC 540
Ethical Software Development in the Medical Field: The Therac-25 Incident
The Therac-25 incident was one of the worst series of radiation accidents in 35+ years.
Massive overdoses of radiation were administered to six known patients between June 1985 and
January 1987 resulting in three deaths and severe injuries to the other three patients. The Therac25 was a linear accelerator produced by Atomic Energy of Canada Limited or AECL. It was
designed to generate high energy electron beams that would treat shallow tumors and x-ray
beams to treat deeper tumors. Therac-25 was infamous for its unreliability and typically
malfunctioned around 40 times each day (Quinn). To gain a better understanding of the Therac25 incident, I will first discuss the history of the first Therac linear accelerators, the Therac-6 and
the Therac-20.
During the 1970s, AECL along with a French corporation by the name of CGR
collaborated to build the Therac-6 and the Therac-20, both of which were modernized CGR
accelerators. The modernized accelerators were distinguished from their older counter parts by
the addition of a minicomputer, the DEC PDP 11, as a front-end. The addition of the PDP 11
made the linear accelerators much simpler to operate (Quinn). The Therac-6, a 6 million electron
volt accelerator, was capable of producing X-rays only. Its predecessor, the Therac-20, was a 20
million electron volt accelerator and had a dual-mode which was capable of producing either Xrays or electrons. Since both the Therac-6 and the Therac-20 had limited software functionality,
the computer was mainly added for convenience (Leveson and Turner). Since all of the safety
features were built into the hardware, both of the accelerators were able to work independently of
the PDP. After completion of the Therac-20, CGR split from AECL accrediting the split to
“competitive pressures” (Leveson and Turner). AECL then continued with the development and
deployment of a new linear accelerator that they later named the Therac-25 (Quinn).
1
Traci M. Glass
CSC 540
The Therac-25 was designed to deliver either photons at 25 million electron volts or
electrons at various energy levels and also utilized a new double pass model. This double pass
electron accelerator required less space to develop high energy levels compared to previous
installations. The new model was also much more economical to manufacture (Leveson and
Turner). Like its predecessors, the Therac-25 made use of the PDP 11. Unlike the Therac-6 and
the Therac-20, the Therac-25 was designed to be unable to function without the PDP 11. This
decision was made because it allowed AECL to cut costs by removing hardware safety elements
and replacing them with software safety features (Quinn). Overall, the Therac-25 was designed
to be easier to use, more compact and more versatile in comparison to its ancestors (Genesis of
the Therac-25).
The software development of the Therac-25 is an interesting subject. The Therac-25
software was basically revised Therac-20 software, which was revised Therac-6 software. This
fact makes sense since it is mainly an upgraded version. The software was developed by a single
person over a few years. One of the critical differences, which was listed earlier, that could
contribute to major issues is that unlike the Therac-6 and Therac-20, the Therac-25 was
completely incapable of functioning without the PDP 11. Some of the tasks that the Therac-25
was responsible for are: monitoring the status of the machine, receiving input in regards to the
treatment, setting up the machine, turning the beam on, turning the beam off, detecting hardware
malfunctions, and delivering diagnostics. Therac-25 also runs on its own customized operating
system (Porrello).
In the year 1976, the first Therac-25 prototype was produced. It was not until late 1982
when the Therac-25 was commercialized. In March 1983, AECL made the decision to perform a
safety analysis, in this case a fault tree, on the Therac-25. This test of the machine did not
2
Traci M. Glass
CSC 540
include software. During this analysis numerous assumptions were made. The final report claims
that many programming errors were greatly reduced due to “extensive” on a simulator. Also, it
states that the software will not degrade over time and that computer execution errors are caused
by faulty hardware (Leveson and Turner). According to reports, this analysis did not seem to
include computer failure in any kind. The first Therac-25 was shipped in 1983. Total there were
11 systems distributed in Canada and the United States. Five of the Therac-25 installations were
in the U.S. with the remaining six installations in Canada (Leveson and Turner).
One of the major components in the design of the Therac-25 was the turntable. This
turntable is also a crucial element in the accidents. The turntable has three modes: electron,
photon (or X-ray), and field-light. The electron mode uses low energy electron beams to treat
shallow tumors in the patient. Even though the electrons are low energy, they are still too strong
to be used on the patient. This being said, scanning magnets are placed in front of the beam to
spread the electrons. This in turn reduces the strength of the beam. The second mode, the photon
or X-ray mode, is a high energy beam that treats deeper tumors. The high energy beam strikes
metal foil emitting the photons. From there, the beam is flattened beneath the foil to achieve the
desired dosage. The final mode of the Therac-25 is the field-light mode. This mode allows the
machine to be aligned prior to treatment. It uses a mirror and light to show where the beams will
be hitting the patient (How Therac-25 worked).
As stated earlier, the operator interface was controlled with the DEC microcomputer. The
interface screen consisted of quite a bit of information including: the patients name, the treatment
mode, beam type, actual amounts of radiation received and prescribed, as well as information
regarding the positions of the turntable. The operator’s procedure was to position the patient on
the treatment table, set the treatment field sizes, and attach any necessary accessories to the
3
Traci M. Glass
CSC 540
Therac-25 unit. After completing their duties, the operator then leaves the treatment room and
returns to the microcomputer. On the microcomputer, the operator inputs the patient
identification, all fields regarding the treatment prescription, and any other remaining data. From
here the system compares the values set in the treatment room with those entered on the
microcomputer. If the values match, then the treatment proceeds, otherwise the treatment will not
proceed (The Operator Interface). Upon complaints from operators regarding the length of time
it took to enter the treatment, the manufacturer changed the software before the first unit was
installed. This modification allowed the operator to copy the treatment data that was set in the
treatment room by using a series of carriage returns. This modification was eventually part in
several of the accidents (Leveson and Turner).
The Therac-25, sadly, had very few safety features. Upon detection of an error, there
were two ways Therac-25 would shutdown. One of these shut downs was the treatment suspend.
This mode required a full system reset to restart treatment. The other shutdown method,
treatment pause, only required a single key press to resume treatment. If this form of shutdown
arose, the operator would press “P” to proceed with treatment using the previous values for
treatment. This feature could be invoked up to five times before it suspended and required a full
reset (Leveson and Turner). Although Therac-25 contained error messages, they were very
cryptic and unhelpful. Most error messages were simply the word “malfunction” and a number
following it. These malfunctions were not described in the operator’s manual. It was unknown at
the time that these malfunctions could harm a patient (Leveson and Turner).
In 1983, the first Therac was shipped and in June of 1985, the first incident occurred. In
Marietta, Georgia at Kennestone Regional Oncology Center, a 61 year old breast cancer patient
was to receive radiation after having a lumpectomy to remove a malignant tumor. This Therac4
Traci M. Glass
CSC 540
25 unit had been installed three months prior with no reported incidents, until this day. The
patient was set to receive treatment to the area around her collarbone. After completion of the
treatment, the patient complained of being burned. After the incident AECL was contacted and
asked if the Therac-25 unit could have possibly failed to diffuse the electron beam. It was not
until a few days later when they called back to explain that this was not possible. The patient
went home, but soon developed swelling and reddening in the treatment area. Also, the patient
later had issues with severe pain in her shoulder area. This pain eventually became so severe that
she could no longer move her shoulder and also she began to have spasms. The reddening soon
spread to her back and the skin in swollen areas had begun to come off in layers. It was obvious
that the patient had suffered from radiation burn, but it was not until much later that the physicist
who performed her treatment estimated that she had received approximately 75-100 times more
radiation that prescribed. In the end, the patient was required to have her breast removed due to
the serious burns. She also lost all use of her shoulder and arm. The patient lived in constant
pain, but the manufactures refused to believe that it was due to the Therac-25 (Leveson and
Turner).
A little over one month later, in Hamilton, Ontario, the second known incident occurred
at the Ontario Cancer Foundation. The patient, a 40 year old woman came in for her 24th
treatment on the Therac-25 unit. Having had so many treatments on the machine in the past, the
patient was fully aware that something was wrong when the unit shut off after about five
seconds. The operators screen displayed that no dosage had been administer, so the operator
made a second attempt. This attempt failed as it did previously with the same message, so the
operator made four more attempts, each failing in the same manner. After the fifth pause, as
previously stated, the unit went into suspend mode and a technician was called. The technician
5
Traci M. Glass
CSC 540
found nothing wrong with the Therac-25 unit. After the treatment, the patient complained that
she had been burned, much like the patient in the first incident. She also described a feeling like
she had been electrically shocked. After her treatment, six other patients received treatment with
no error. The patient returned three days later for further treatment and complained of burning,
hip pain, and swelling in the treatment region. That day, the Therac-25 unit was removed from
service. The patient was hospitalized the next day. AECL was informed and later sent a
technician for investigation. It was estimated that the patient received 65-85 times more radiation
than was prescribed. The patient died that November from cancer, but it was noted in the autopsy
that if she would have lived she would have needed her hip replaced due to the excessive
radiation (Leveson and Turner).
After this incident, AECL began its first investigation of the Therac-25 in July 1985. An
engineer was sent to the Ontario Cancer Foundation in hopes to reproduce the malfunction.
Although the AECL engineer was never able to reproduce the malfunction, he did suspect that
there was an issue with the microswitch which is used in determining the position of the
turntable. During the investigation, other design flaws and probable hardware issues were found.
AECL released both hardware and software updates for the Therac-25 and reported that
“analysis of the hazard rate of the new solution indicates an improvement over the old system by
at least five orders of magnitude (Leveson and Turner).” The investigation concluded in
September of 1985 (Leveson and Turner).
In December 1985, a woman went to Yakima Valley Memorial Hospital for radiation
treatments. Following one of her treatments, she developed a reddening of the skin in the
treatment area in the form of parallel lines. Since her reaction was at the time not considered
dangerous or unusual, she continued treatments with the Yakima Therac-25 unit. The patient
6
Traci M. Glass
CSC 540
completed her treatments in January 1986. In late January, the red marks were then deemed to be
unusual. After this, the staff monitored the red stripes and came to believe that the blocking trays
were the cause, but by this time the blocking trays had been removed and discarded. Due to this
removal, the pattern could not be reproduced. On January 31st, the hospital staff sent AECL a
letter in regards to this incident. It was not until February 24th that AECL responded to the letter.
Their response claimed that it was impossible for the Therac-25 to produce the red markings.
They continued by explaining for two pages how the incident was technically impossible. In the
end, the patient survived, but with major scarring and mild disability (Leveson and Turner).
The next incident occurred at the East Texas Cancer Center in March 1986. On this day, a
male patient came in for his ninth radiation treatment for a cancerous tumor on his back. The
operator followed the standard procedure of setting the patient up in the treatment room and then
returning to the operator room to set up the microcomputer. As the operator was setting up the
minicomputer she realized that she had accidentally typed “X” for X-Ray instead of “E” for
Electron. After quickly correcting her error, began treatment. A few seconds into treatment, the
Therac-25 shut down and displayed a malfunction error, “Malfunction 54,” and a treatment
pause message on the screen. The only description of this malfunction was that it was a “dose
input 2.” It was later discovered that this meant the dosage administered was too high or too low.
The display showed that the patient had only received 6 of 202 units. After seeing this, the
operator proceeded with treatment. Since the patient and operator are in separate rooms, there
were audio and video monitors. Unfortunately, the monitors were down that day. It was not until
the operator heard the patient beating on the door that she stopped treatment. The patient
described the first attempt as an electric shock or as if someone had poured hot coffee on him.
After this hit, he began to get up from the treatment table. At this point, the second treatment
7
Traci M. Glass
CSC 540
attempt hit his arm. He also compared this to an electrical shock and stated that it was as if his
hand was trying to leave his body. He then began to beat on the door until the operator stopped
treatment and released him. Immediately, the patient was examined. It was suspected that he had
been shocked and he was sent home. Following the incident, the patient continued to experience
pain and soon became paralyzed in his left arm, both legs, his left vocal cord (leaving him unable
to speak), and left diaphragm. He also experienced problems with his bowels and bladder. He
also had a lesion on his left lung and recurring herpes infections. Five months later, the patient
died of complications to radiation overdose (Leveson and Turner).
The day after the incident in Tyler, Texas, the second investigation began. One AECL
engineer from Canada and a local engineer spent an entire day testing the Therac-25 unit in an
attempt to reproduce the malfunction 54. After being unable to reproduce the error, the engineer
from Canada stated that it was impossible for the unit to administer an overdose. The local
engineer then asked if there were any other incidents with radiation overdoses and were informed
that none had occurred. AECL suggested that the patient received an electrical shock due to a
fault in the hospital’s electrical system. After checking the electrical system, the Therac-25 unit
in Tyler was put back into service (Leveson and Turner).
On April 11, 1986, only three weeks after the first incident at the East Texas Cancer
Center, a male skin cancer patient came in for an electron treatment. The operator, the same from
the earlier incident, set up the machine for treatment. Again, she typed too quickly and made an
error. She swiftly corrected her mistake and began the treatment. As like the first time, the
machine shut down after a few seconds and the screen displayed the “Malfunction 54” error
message. After hearing the patient making a loud moaning noise over the now working intercom,
she ran into the treatment room. The patient described a feeling of “fire” on his face in the
8
Traci M. Glass
CSC 540
treatment area. The operator ran to find the hospital physicist to inform him that another patient
had been “burned.” The patient described the incident to the physicist saying that something had
hit him on the side of the face; he then saw a flash of light, followed by a sound reminiscent of
frying eggs. After the incident, the patient’s condition worsened greatly; he progressively slipped
into a coma, developed a fever of 104 degrees, and had neurological damage. Three weeks after
the incident, on May 1, 1986, the patient died from a high radiation overdose to the right
temporal lobe of the brain and the brain stem (Leveson and Turner).
Following the second incident at the East Texas Cancer Center, the machine was
immediately taken out of service. After contacting AECL, the hospital physicist and the operator
began their own investigation. The pair was eventually able to decode the “Malfunction 54”
error. They found that if the operator made a mistake and quickly corrected it, the machine
would not have time to catch up and the overdose would then occur (Leveson and Turner).
On January 17th, 1987, a male patient came to Yakima Valley Memorial Hospital to be
treated for carcinoma. As with other incidents, the machine shut down after a few seconds of
treatment and displayed a message. The operator then proceeded with treatment and again the
machine shut down with a treatment pause. After hearing the patient make a noise, the operator
went into the treatment room to check on the patient. The patient described to the operator a
burning sensation in his chest. After a while, the patient developed a skin burn that a few days
later took the form of the same striped pattern from the earlier incident. In April, the patient died
of complications from radiation overdose (Leveson and Turner).
On February 10th, 1987, the Therac-25 was officially declared defective by the FDA
under the Radiation Control for Health and Safety Act. AECL was ordered to inform all
purchasers of the Therac-25 unit of their defectiveness. To regain FDA approval AECL had to
9
Traci M. Glass
CSC 540
demonstrate how it would make the system safe. The process to regain approval was to
investigate the issues, develop a solution, and then notify the FDA with a corrective plan. After
five revisions spanning over the course of five months, AECL finally met FDA approval.
Included in their revisions were a variety of hardware interlocks to prevent the machine from
administering overdoses or activating the beam when the turntable was not in the correct position
(Leveson and Turner).
One of the biggest mistakes in the development and production of the Therac-25 was that
there was only one developer. In most development environments, there are at least 2 developers
and in most cases there are multiple teams of developers. With software that is critical, such as
Therac-25, the upmost care should be taken in development and production. Another mistake
that was made during the development of Therac-25 was that limited testing on simulators. This
was due to excessive confidence. The developer also exchanged faith in hardware reliability with
software reliability. It was assumed that there were no design flaws since there were no issues
with the hardware. The developer assumed that if there was to be an issue, it would be with the
hardware and not the software. This was due to errors in previous versions always being found in
the hardware. Also during development, there was very little documentation.
Incidents, such as Therac-25, make us question what can we do to insure safe software?
How can we encourage our employers to spend the money to implement more safety features?
What can we do to make sure this doesn’t happen to us? To insure our software is safe we can
use stricter software development methods. One thing that we should always include in our code
is some implementation of try and catch statements. If you are developing for a mission critical
system, you should use a mission critical language like Ada. Another thing we can do is use
languages that are strongly typed. We can also test our code frequently on simulators. It is
10
Traci M. Glass
CSC 540
necessary in the technical field to program with safety and security in mind. These thoughts do
not only apply to mission critical systems, but to all software development. Being hasty or lazy
in this field can not only result in inefficient software or other software issues, but in some cases
we are developing software for uses where accuracy is of the greatest importance.
11
Traci M. Glass
CSC 540
Bibliography
Genesis of the Therac-25. 31 March 2011
<http://computingcases.org/case_materials/therac/supporting_docs/levenson/Therac%20History.html>
.
How Therac-25 worked. 31 March 2011
<http://computingcases.org/case_materials/therac/supporting_docs/therac_case_narr/Machine_Desig
n.html>.
Leveson, Nancy and Clark S. Turner. An Investigation of the Therac-25 Accidents. July 1993. 31 March
2011 <http://courses.cs.vt.edu/cs3604/lib/Therac_25/Therac_1.html>.
Porrello, Anne Marie. Death and Denial: The Failure of the THERAC-25, A Medical Linear Accelerator. 31
March 2011 <http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html>.
Quinn, Michael J. Ethics for the Information Age. Addison-Wesley, 2011.
The Operator Interface. 6 April 2011
<http://computingcases.org/case_materials/therac/supporting_docs/levenson/Interface.html>.
Therac-25 Wikipedia. <http://en.wikipedia.org/wiki/Therac-25>.
12
Download