Abstract

advertisement
Patriot Missile Crises
By Benji Boban
Abstract
During the Persian Gulf War conflict (August 2, 1991 to February 28, 1991) the Patriot Missile
Defense system was deployed to provide the US Army and the Israeli defense with a shield
from, among other things, the ballistic missiles used by Iraqi forces. The success of this system
is a highly debated subject, however one major failure of this system was realized when 28
American soldiers were killed on February 25, 1991 as the system failed to detect the incoming
Scud missile. This paper analyzes the issues of the Patriot missile system as deployed in the
Gulf conflict and tries to provide an explanation of the adopted resolution while also evaluating
the failures in the requirements of this particular deployment. In order to better understand
the problem, we will gain an understanding of the particular system, research the various
papers detailing the problem, specifically with reports from oversight committees to congress
as well as the scholarly essays on the subject, and conclude with a look at the eventual
resolution of the problem and compare them with other possibilities of resolution.
Systems 6309 – Spring 2011
Page 1
Introduction
During the Persian Gulf War, also called Operation Desert Storm, a Patriot Missile System failed
to intercept an incoming Scud missile and an army barracks in Dhahran, Saudi Arabia was hit.
28 American soldiers were killed. The purpose of this document is to explain the system,
discuss the failure, understand the reason for failure, and discuss the solution implemented.
The Patriot Missile system is a Surface to Air Missile system. Its main purpose is to detect, track
and destroy enemy airplanes and missiles. It primarily is created to operate on the battlefield
and thus has many unique requirements. Raytheon was the primary contractor on this system
and worked with the Patriot Project Office and the Army to create this system. As it was
produced in 1976, and warfare has changed between 1976 and 1991, there have been changes
in requirements.
The disaster which occurred on Dhahran was not the result of one major problem. In fact the
root cause in many pieces of literature point to a software issue. However, there were many
other problems with this system of systems, as evidenced in the report to congress and other
sources. The failure however, was corrected via software, which in essence was the beauty of
the Patriot Missile System. It is a hardware system whose functionality can be affected
primarily by software.
The question then becomes what was the cause of failure? Was it the domain shift, or the
requirements creep, or a change in the software model of the system? Inherent in answering
these questions, is a brief look into the domain of the Patriot Missile System, the requirements
for the system, and the software world of the system. However, it is difficult to take a
complete look into the requirements and software world as it restricted information. Looking
at these three components we can understand the “As-is” nature of the original Patriot Missile
System versus the “To- Be” nature required in the Persian Gulf War, as well as propose the
correct “To-Be” nature differences from the deployed system. This will involve a look into the
solution used in order to better understand the interaction of the enterprises involved in the
system, as well as a comparison of the deployment of the Patriot Missile system in the Persian
Gulf War to a typical systems lifecycle and any extrapolations which can be derived from it.
In conclusion we need to take both the good and the bad lessons learned from this disaster at a
requirements/systems level and understand how to apply them to future requirement
engineering practices. It is my belief that a better appreciation for the domain of a system, a
better feedback loop for any requirements creep, as well as the implementation and adherence
Systems 6309 – Spring 2011
Page 2
to a requirements lifecycle can provide the military and its contractors as well as any other
systems developers with less of a chance at a catastrophe as the disaster in Dhahran.
There are many related works to this paper, but the primary version is the “Report to the
Chairmn, Subcommitte on Investigations and Oversight, Committee on Science, Space and,
Technology, House of Representatives – Patriot Missile Defense, Software Problem led to
System failure at Dhahran, Saudi Arabia.” Due to the military application of this Patriot Missile
System, certain documents, such as user manuals, requirements documentation, etc, were
difficult to obtain. However, there is plenty of information regarding the system itself in news
articles as well Raytheon’s website.
1 As-Is
1.1 The Patriot Missile System of Systems
The Patriot Missile System was first produced in 1976 as a Surface to air missile system. It has
been in service from 1981 onwards and has been involved in various conflicts around the world.
The system is truly a system of systems. It consists of 4 major systems with the functions of
Communications, Command and Control, Radar Surveillance and Missile guidance. During its
initial use, its major functionality was as an anti-aircraft system. However, shortly before the
Persian Gulf War the system was modified to include anti-ballistic tactile abilities. The name of
the system comes from its first component, the radar surveillance unit AN/MPQ-53 (the 65 was
developed for the PAC-3 missiles, which was developed after the Persian Gulf War). The
AN/MPQ is the “Phased Array Tracking Radar to Intercept on Target” or “Patriot”, which soon
Figure 1.1 AN/MPQ53
http://www.ArmyRecognition.com
became the common name for the whole system. The
AN/MPQ 53 consisted of a scanned array radar with identification capability, the ability to
evade ECM (electronic countermeasures) such as wave jamming, and the track via missile
guidance system. The AN/MPQ is a single unit, as opposed to multiple radars for the system.
Basically it is the only array used from the missile detection to the missile engagement and or
destruction. The second system is the command and control station, AN/MSQ-104 engagement
control station. This subsystem of the Patriot SoS consisted of 5 major parts. The first is the
Systems 6309 – Spring 2011
Page 3
Weapons Control Computer (WCC), which was the main computer of the Patriot missile system.
Figure 1.2
The second is the Data Link Terminal which
is AN/MSQ-104
the interface to the missile launchers. The third is
the UHF communications array which creates
the medium for the
http://www.cafedragoon.net/trip/misawa_air_fes/anti_aircraft/index.html
network communications between the patriot missile systems.
The fourth component of the AN/MSQ-104 is the Routing Logic Radio Interface Unit, which is
the router for all the data traffic of the Patriot system. The last component is the two computer
manstations, which is where the operators interface with the system. All of these components
are contained in a mobile shelter capable of withstanding electromagnetic interference,
chemical/biological attacks, and also acts as protection against the elements. The third system
in the Patriot Missile SoS is the OE-349 Antenna Mast Group. It consists of a mobile platform
and a 2-pair 4 antennae system with the associated amplifiers and radios. The OE-349 AMG’s
primary purpose is to create the Patriot communications network. The next system is the
M901 launching station. This is where the missiles and the launchers are contained. They are
remotely controlled as well as are a self-contained system. The last, but arguably the most
important component of the Patriot SoS is the Patriot Missile itself. There are three primary
versions of the missile. The first was the MIM-104A which was created solely for anti-aircraft
purposes. The second, MIM-104C/D/E (also called PAC-2) was created to extend the use of the
missiles for anti-missile engagements. The MIM-104C was the primary version used in the
Figure 1.3 M901 Lanuching and Patrio Missile
http://en.wikipedia.org/wiki/MIM104_Patriot#Patriot_in_the_Persian_Gulf_War.2FO
peration_Desert_Storm_.28JanuaryFebruary_1991.29
Persian Gulf War. The MIM-104F or PAC-3 is the latest version. Other than improved tracking
and flying capabilities, the PAC-3 is a hit to engage type of missile as opposed the PAC-2 which
tried to destroy the enemy missiles by exploding in their vicinity. All of the missile types
Systems 6309 – Spring 2011
Page 4
consisted of 4 major sections. The first is the radome, which is the tip of the missile and
contains the window and the protection for the RF seeker. The second is the guidance section
which consisted of the auto-guidance systems and also the track via guidance system (from the
ground control). The third and fourth sections are the Warhead section and the Propulsion
section. The last section is the Control actuator section which controls the fins of the missile for
stability and steering. These five subsystems of the Patriot Missile System are systems in their
own rights and are responsible for the four main areas of the system by working together. They
each interact with each other and the domain in order to perform their respective functions
within the overall system.
1.2 The Domain and the Problem
In order to properly understand the Patriot Missile system, a proper understanding of the
domain of the system must be developed. Inside of the domain there is the interaction
between the customer and vendor and also the environment for which the system was created.
The environment was the battlefield and the main operators are soldiers who would receive
training. Initially the system was a customer driven product for the United States Army and the
initial purpose was an anti-aircraft missile. According to one report “The Patriot system was
originally designed to operate in Europe against Soviet medium- to high-altitude aircraft and
cruise missiles…To avoid detection it was designed to be mobile and operate for only a few
hours at one location.” [1] As a typical customer driven product the requirements should have
come from the Army. Since the initial request for the system occurred when the need was for
anti-aircraft missiles on the European front, primarily against the Soviets, the domain
environment and the need has shifted dramatically with respect to the Persian Gulf warfront.
There were two main players in the Patriot Missile system creation environment, they are as
follows: The Army and the contractor (primarily Raytheon). Even though the contractor and the
users stayed the same, the environment in which the system would be used changed. The
environment in which the Patriot Missile was went from high mobility anti-aircraft attacks to
stationary barrack/civilian population protection. The changes in the environment were
accounted for; however the changes were made after the systems were deployed in the war.
“As information from all sources became available, software changes were made from August
1990 to February 199 1 by the Patriot Project Office in Huntsville, Alabama, to adapt the system
to the Desert Storm environment.” [1]
The shift in the environment and in essence the requirement produced several problems with
the Patriot Missile System. The can be divided into three major categories. The first is the
software issues, the second is the hardware issues, and lastly the user or enterprise issues. The
Software issues dominate any conversation and/or report with respect to the failure in
Systems 6309 – Spring 2011
Page 5
Dhahran. Most reports point to the failure of the software to manage the clock drift or account
for the round off error. However, there were also other issues with the software such as
software upgrade time, software delivery time, and the reboot time for a system to come up.
For example the software upgrade time required the Patriot systems to be shutdown for 1 to 2
hours, and thereby provided a vulnerability to any attacks. Also the reboot time, in order to
reinitialize the clock, required 60 to 90 seconds [1]. One other major problem, which is
virtually forgotten in the current day of high speed global internet connections, was the fact the
upgrades had to be shipped to each site for an upgrade. Therefore, each upgrade could take
days to reach the Patriot Missile Battery and then at least an hour for an upgrade. On the other
hand, due to its dependence on software for many of its functional capabilities, the hardware
concerns of the System had to be taken care of prior to deployment on the warfront. The main
hardware concerns during the Persian Gulf conflict was the absence of a recorder for the
system performance and the different types of the enemy missiles which needed to be
accounted for in this war. It seemed the recorder was not part of the initial requirements and
therefore was not included in the system. There were external recorders available, however,
“…U.S. commanders decided not to use them because they believed the recorders could cause
an unanticipated system shutdown.” [1] The enemy missiles and aircraft for which the Patriot
Missile System was designed for had top speeds of MACH 2. However, in the Persian Gulf War,
Scud missile had speeds of MACH 5. In a testament to software’s strengths, this hardware
problem was solved via software algorithms to create a faster response from the system as
opposed to improving the speed of the missiles. The last major area of issues was the user
issues. The main issues involving the end user, i.e. soldiers, included the lack of an audible cue
to know when an enemy target has appeared and the information about the operation doctrine
in lessons to the end soldiers. If the end soldiers understood better how the Patriot Systems
worked, especially in the expectation that the system would be “…relocated twice daily, and its
software reinitialized after each move” [2] and how that affects the underlying components of
the system, then the disaster at the barracks may have been avoided. Basically a high level
understanding of the operating doctrine of the system should have been made available to the
users of the system.
2 To Be
2.1 Solution to the Problem
In order to understand the solution, the problem must be clearly stated. The problem in the
Dhahran attack was the system failed to detect a missile after the system had been running
continuously for over 100 hours. Most major articles on this topic concluded the main culprit
Systems 6309 – Spring 2011
Page 6
behind this problem was the Range-Gate algorithm and clock error. This paper will describe the
problem as discussed by other papers as well as the solution provided by the Patriot Project
Office (software division in charge of the Patriot missile system). We will also look into a higher
level view of the problem both from the ware perspectives as well as from a requirements
lifecycle perspective.
The generally accepted conclusion for the cause of failure is a software based round-off error.
The Weapons control computer in the AN/MSQ-104 was a 24 bit machine. The algorithm used
to calculate the path of the Scud Missile or any ballistic missile was called the Range-Gate
algorithm. If the missile followed the path prescribed by the range gate algorithm, then it was
identified as an enemy target and engaged. However, if it was not inside that area, it was
ignored. In order to calculate the position of the missile, two variables are required, the
velocity of the missile and the current time. The current time of the system was recorded in
tenths of a second in whole integers and the velocity was also stored in the same manner.
Since the machine was a 24 bit machine, there was a loss in precision as the time increased or
the velocity became significantly higher. This alone does not account for the failure of tracking
a missile as the time calculation used in the algorithm is a subtraction, and therefore the error
should have subtracted out. The actual problem was the loss of precision, as well as an error in
a routine to convert the time into a 48 bit floating point number. According to Robert Skeel [3]
this routine was not substituted into every point in the calculation and therefore the combined
effect of the precision and round-off calculation caused the range gate algorithm not to detect
the incoming scud missile. (see figure 2.1.1 for a graphical representation of the range gate
algorithm and table 2.1.1. for calculated shift in the range gate in terms of hours). The solution
therefore was a software change as
Figure 2.1.1. Calculate Range Gate
[1] page 7
Systems 6309 – Spring 2011
Page 7
Table 2.1.1 Calculated range Gate
[1] page 17
this problem was prescribed as
strictly a software response. Initially the problem was uncovered by the Israeli army due to
their use of data recorders on their systems. They were able to confirm a 20% shift on the
range gate after 8 consecutive hours of uptime for the system. Since in the system
specifications provided by the Patriot Project Office declared a range shift of over 50% will
cause a failure of the algorithm to identify the target [1], this was a significant finding. This
finding was made known on February 11, 1991. The software was modified and released on
February 16, 1991 and in the meantime “on February 2 1, 199 1, the Patriot Project Office sent
a message to Patriot users stating that very long run times could cause a shift in the range gate,
resulting in the target being offset. The message also said a software change was being sent
that would improve the system’s targeting. However, the message did not specify what
constitutes very long run times.” [1] Adding to the timeline was the fact army officials did
believe users left the systems running for such extended periods of time and therefore did not
provide any additional commands to the field. [1] The modified software arrived on February
26,1991, the day after the barracks were attacked.
Even though the above was a software problem in the purest sense of the word there were a
few other problems which were ignored which could have caused the unwanted behavior of
the destruction of a barracks to be avoided. Those problems include the neglect of data
recorders, the ambiguity in the statement by the Patriot office about the length of time it takes
to cause a misidentification, the ignorance of the Army officials of the manner in which the
system was being used, as well as the delay in delivering a mission critical software upgrade to
the field. Therefore if the problem is described as above, i.e., “system failed to detect a missile
after the system had been running continuously for over 100 hours” then not only was the
range gate algorithm a method in which the problem could have been avoided, but all four
other factors above could have proven to avoid the disaster.
2.2
Recommendations
Systems 6309 – Spring 2011
Page 8
As can be seen in the response to the failure of the Patriot Missile System to protect the men at
the Dhahran barracks and the resulting report in from of the congressional committee, the root
cause analysis identified the problem as a software bug in the Range-Gate Algorithm coupled
with a round off error in calculations. However, in reality, the problem was actually a
requirements creep. The usage of the Patriot Missile Defense System in the Persian front
Figure 2.2.1
http://www.utdallas.edu/~chung/SYSM6309/proces
s.pdf Page 6
presented a whole new environment than the European theater and thus required the
requirements to be updated. As can be seen in Figure 2.2.1 there was a change in the reality, or
the problem domain, and it must be reflected in the requirements. In order to highlight the
problem, the customer, in this case the United States Army, must tell the vendor there has
been a shift in the problem domain. But as also seen above it seemed the Army did not realize
itself there was shift in the environment until it was too late. This brings about the point that
there was more than one issue with the Patriot missile system. So other than the range –gate
algorithm and the environment change, there were other creeps in the requirements of the
traditional wares as well as a broken communications process.
The first major area which the Patriot Missile system could have used to safely avoid the
disaster at the barracks was also in the arena of software and time. The Software upgrade
time took 1-2 hours on average. However, if the requirements were made in which software
upgrades can be done with no system down time, maybe officers would have been more eager
to understand any faults in their system. Granted the there was a 10 day gap in between the
patch of the software and the time the software arrived at the base, adding this as a
requirement would not have avoided this particular problem. However, this brings about the
point of the delivery of the software. If there had been requirements for interconnection into a
network with access to the software upgrades, the 10 days would not have been lost and the
system could have been upgraded in time to save lives. The other concern, though minor, is
the amount of time a reboot takes. Rebooting takes about 1 to 1.5 minutes in order to reset the
clock. However, “…Since the data processing needed to detect the launches took about 5
minutes when the war first started, and since from launch to impact a TBM’s flight is about 6-7
Systems 6309 – Spring 2011
Page 9
minutes, only about 1.5 to 2 minutes were available…”. [2] Therefore, it was dangerous to
reboot a system because of the distinct possibility of the system not being able to catch an
incoming missile. Therefore another requirement at the beginning of the project could have
asked for a smaller reboot time than the time needed to track and fire a Patriot missile
response. This would have helped with the confidence in rebooting and may have resulted in
more frequent reboots after the Israeli report had come out on the February 11 th.
The second and third major areas are in the hardware and user domain respectively. One
hardware ability largely ignored, or not fulfilled, in the Patriot Missile System was the ability to
record the system data. Even though there was an external option, the army was afraid to use
it due to fear of system failures. If an internal recorder had been built, the ability to study data
and find the range –gate error would have significantly increased. The Israelis used an external
recorder and was able to find the error by February 11, 1991. However, if the US Army was
analyzing its own data as well, the chances of finding the error and a resolution were bound to
increase. The other area is in the case of the end user. Two major problems existed at the
operator level in the user domain. The first was the absence of an audible alarm when an
enemy target is detected and or engaged. There is no way to know for certain whether this
would have helped avoid the fatalities at Dhahran, but it does show there were problems with
the hardware. However, the last problem is a process issue. Quality processes are highly
related to the quality of the product. There were two main communication process broken.
The first was from the field end user to the army leaders. The army leaders did not know the
system was not being used as a mobile platform and therefore they were unable to use the
information from the Patriot Project Office to warn the operators of the system. The second
major communications process which was broken was from the Israeli users and the Army
officials. The army assumed the Israeli issue with drift was atypical [1]. However, a software
patch was still introduced, but the data from the Israeli report was not used. This leads us to a
discussion on the use of a requirements lifecycle for the Patriot missile system.
One familiar requirements engineering model is the spiral model (see Figure 2.2.2). In this
model there are four major parts. They begin with requirements elicitation, then requirements
analysis and negotiation, then on to requirements document and ends at requirements
validation. However, when the validation is done then the elicitation starts over again. The
purpose of the spiral model is to account for shifts in the requirements as well as to discover
Systems 6309 – Spring 2011
Page 10
Figure 2.2.2
http://www.utdallas.edu/~chung/SYSM6309/process.
pdf Page 9
missed requirements. When the
Patriot Missile System was being brought over from the European theater to the Persian Gulf
front, there were changes in the requirements that were acknowledged and was worked on as
stated in the following quote, “…As information from all sources became available, software
changes were made from August 1990 to February 199 1 by the Patriot Project Office in
Huntsville, Alabama, to adapt the system to the Desert Storm environment.” [1] However, the
question is how many more changes occurred due to requirements elicitation with information
from the operators in the Desert? If there had been a better process for communications
between operators of the system and the army officials and also in between Patriot Project
office or Raytheon and the army officials then the drift/roundoff error may have been flagged
long before February 25, 1991.
3 Conclusion
The Patriot Missile Incident in Dhahran taught us many invaluable lessons. While the root
cause analysis highlighted the software algorithm error, it was insightful to understand the
effects requirements creep had on the system. As the System was not required to be a semipermanent setup, when it was used as one it caused failures resulting in deaths. The primary
culprit therefore was not the algorithm, though that was the best fix. It was the changing of the
environment and thereby the requirements which caused the disaster. However, the blame
also lies in processes as well. If there had been a better communications process between all
the agents involved with the Patriot Missile System, not only would the requirements creeps
have been discovered faster, but also countermeasures may have been added as a stopgap
while the new requirements were being fulfilled by the contractor. And last, but not least, is
the adherence to a requirements lifecycle model. If the spiral model had been used, it would
have been plausible to go through another round of elicitations, especially with the operators
Systems 6309 – Spring 2011
Page 11
of the Patriot Missile System in the Gulf arena or even with the Israeli army, and thereby could
have saved the lives of the soldiers.
Systems 6309 – Spring 2011
Page 12
4 Bibliography
1) United States Congress. Report to the Chairman, Subcommittee on Investigations and
Oversight, Committee on Science, Space, and Technology, House of Representatives: Patriot
Missile Software Proble, February 1992 Ralph V. Carlone. GAO/IMTEC-92-26
2) Riezenman, Michael. 1991, September “Revising Script after Patriot”, IEEE Spectrum,volume
0018-9235/91/0009-0049, pg 49-51
3) Skeel, Robert, 1992, July “Roundoff Error and the Patrior Missile” SIAM News, Volume 23
Number 4, pg 11.
4) “MIM-104 Patriot” Wikipedia.org. November 15, 2004. May 15, 2012.
http://en.wikipedia.org/wiki/MIM-104_Patriot
Systems 6309 – Spring 2011
Page 13
Download