File - Eric Chavez MD MMI

Eric Chavez
Med Inf 404 Care Group Case Study
During five days in November 2002 the entire computer network of a hospital group in Boston
became non-operational. At the time, the information technology (IT) and electronic health record
systems at CareGroup in Boston were award-wining and were considered “the best knowledge
management service in the United States” (McFarlan, 2003, p. 4). However, when a research
programmer accidentally allowed an experimental network program to continue to run when he suddenly
left for paternity leave, the network became overwhelmed and all network-reliant software ceased to
function. This disabled all clinical information systems for the hospital group and forced the entire
organization to revert to paper-based systems.
As is the case with many major accidents, the cause was multifactorial. The experimental
program disabled a main network switch by overloading it with network traffic. As the system attempted
to reroute data packets through alternative pathways, more network switches became overwhelmed and
became stuck in repeating loops. This was partly due to old network topography and outdated equipment
that was malfunctioning. Eventually the entire network was busy with meaningless repeating messages,
and no bandwidth was left for any other traffic.
In the Harvard Business Review article, McFarlan (2003) mentions several process controls that
could help to prevent this sort of event. The network should have been maintained with more current
equipment and with more modern network topography. More than one network engineer should have
been employed to manage the network to serve in a check-and-balance fashion to make sure that bad
ideas did not get accepted and propagated. A process of continuing education for network engineers could
ensure that the best ideas and most current protocols are being used for the system. A process for
replacing the physical equipment at regular intervals would help to prevent equipment failures and ensure
the most modern functionality. A strict change-control process was needed in order to prevent
unauthorized users from making changes to network components and topography. It was clear that more
redundancy in the system was needed. This includes a standby core network that could be deployed in the
event the primary network core failed and modems with dial-up computers to serve as backups. Network
management software could be installed to monitor network function and traffic in order to detect
problems early. Scheduled downtime on the weekends could be used to make network repairs without
being too disruptive to medical center operations.
Although the article states that no significant harm came to patients during the network
downtime, it does mention that some delays in care were experienced (McFarlan, 2003). Delays were
likely the result of the entire hospital staff being forced to revert to paper-based systems. Delays in
processing physician orders, reporting lab results, administering medications, and reporting radiology
results could have caused patient harm. Some orders for lab tests and reports of results were lost while
using the paper-based system. This could certainly cause patient harm since providers would not have the
information needed to make diagnostic and treatment decisions. Radiologists using film x-rays when they
were used to using digital images and computer tools for reading radiographs may have missed some
important findings. Information that was recorded on paper could have been lost which would also put
providers at a disadvantage when caring for patients. Pharmacists had to manually check for drug-drug
and drug-allergy interactions. Human error in this process could cause patient harm. The emergency room
physicians did not have access to the previous medical history of patients in the EHR. They had to rely on
patient report and this could have led to errors and patient harm. Since the emergency department of the
largest hospital in CareGroup went on bypass during part of the event, I can imagine that some critical
Eric Chavez
Med Inf 404 Care Group Case Study
patients were forced to divert to other hospitals. Such a diversion and delay of care could have resulted in
patient harm but would never have been known to CareGroup because the patients would have never been
admitted there.
Overall I agree with the timeline of the response. When the IT department was aware of the
network problem, they began to trouble shoot and tried to solve the problem in-house. There was a desire
to make changes to the network immediately to try to fix the problem, but good judgment on the part of
the CIO held off any rash actions. When they realized that the problem was too big to solve in-house, they
contacted their network consultants at Cisco. The Cisco rapid response team swooped in and took over
operations with the authority of the CIO. They worked around the clock for several days to diagnose the
problem, make repairs, and get the network running again. This was a major effort and a necessary one
because one of the largest hospital systems in Boston was being adversely affected and thousands of
patients were risking harm every hour that the system was down. I also agree with completely abandoning
the computer system for the paper-based system until it was confirmed that full-network functionality was
restored. The switch back and forth from computer to paper system would have certainly caused more
errors and potentially adverse outcomes for patients.
It is impressive that Cisco was able to devote so much effort to solve the network problem at
CareGroup. The CIO at CareGroup, Dr. Halamka, learned many valuable lessons from the experience.
Many of the lessons have to do with the absolute critical nature of networks, their infrastructure, and the
need to devote significant human and material resources to maintain them. This event also demonstrates
the need for redundancy in network infrastructure and the importance of having a disaster plan—in this
case backup paper-based systems. I applaud Dr. Halamka in his efforts at transparency in reporting on the
event so that others can learn from it and prevent a similar network collapse at other medical centers.
McFarlan, FW, & Austin, R (2003), CareGroup. Harvard Business Review, 303-097.