Eric Chavez Med Inf 404 Care Group Case Study 8/10/14 During five days in November 2002 the entire computer network of a hospital group in Boston became non-operational. At the time, the information technology (IT) and electronic health record systems at CareGroup in Boston were award-wining and were considered “the best knowledge management service in the United States” (McFarlan, 2003, p. 4). However, when a research programmer accidentally allowed an experimental network program to continue to run when he suddenly left for paternity leave, the network became overwhelmed and all network-reliant software ceased to function. This disabled all clinical information systems for the hospital group and forced the entire organization to revert to paper-based systems. As is the case with many major accidents, the cause was multifactorial. The experimental program disabled a main network switch by overloading it with network traffic. As the system attempted to reroute data packets through alternative pathways, more network switches became overwhelmed and became stuck in repeating loops. This was partly due to old network topography and outdated equipment that was malfunctioning. Eventually the entire network was busy with meaningless repeating messages, and no bandwidth was left for any other traffic. In the Harvard Business Review article, McFarlan (2003) mentions several process controls that could help to prevent this sort of event. The network should have been maintained with more current equipment and with more modern network topography. More than one network engineer should have been employed to manage the network to serve in a check-and-balance fashion to make sure that bad ideas did not get accepted and propagated. A process of continuing education for network engineers could ensure that the best ideas and most current protocols are being used for the system. A process for replacing the physical equipment at regular intervals would help to prevent equipment failures and ensure the most modern functionality. A strict change-control process was needed in order to prevent unauthorized users from making changes to network components and topography. It was clear that more redundancy in the system was needed. This includes a standby core network that could be deployed in the event the primary network core failed and modems with dial-up computers to serve as backups. Network management software could be installed to monitor network function and traffic in order to detect problems early. Scheduled downtime on the weekends could be used to make network repairs without being too disruptive to medical center operations. Although the article states that no significant harm came to patients during the network downtime, it does mention that some delays in care were experienced (McFarlan, 2003). Delays were likely the result of the entire hospital staff being forced to revert to paper-based systems. Delays in processing physician orders, reporting lab results, administering medications, and reporting radiology results could have caused patient harm. Some orders for lab tests and reports of results were lost while using the paper-based system. This could certainly cause patient harm since providers would not have the information needed to make diagnostic and treatment decisions. Radiologists using film x-rays when they were used to using digital images and computer tools for reading radiographs may have missed some important findings. Information that was recorded on paper could have been lost which would also put providers at a disadvantage when caring for patients. Pharmacists had to manually check for drug-drug and drug-allergy interactions. Human error in this process could cause patient harm. The emergency room physicians did not have access to the previous medical history of patients in the EHR. They had to rely on patient report and this could have led to errors and patient harm. Since the emergency department of the largest hospital in CareGroup went on bypass during part of the event, I can imagine that some critical Eric Chavez Med Inf 404 Care Group Case Study 8/10/14 patients were forced to divert to other hospitals. Such a diversion and delay of care could have resulted in patient harm but would never have been known to CareGroup because the patients would have never been admitted there. Overall I agree with the timeline of the response. When the IT department was aware of the network problem, they began to trouble shoot and tried to solve the problem in-house. There was a desire to make changes to the network immediately to try to fix the problem, but good judgment on the part of the CIO held off any rash actions. When they realized that the problem was too big to solve in-house, they contacted their network consultants at Cisco. The Cisco rapid response team swooped in and took over operations with the authority of the CIO. They worked around the clock for several days to diagnose the problem, make repairs, and get the network running again. This was a major effort and a necessary one because one of the largest hospital systems in Boston was being adversely affected and thousands of patients were risking harm every hour that the system was down. I also agree with completely abandoning the computer system for the paper-based system until it was confirmed that full-network functionality was restored. The switch back and forth from computer to paper system would have certainly caused more errors and potentially adverse outcomes for patients. It is impressive that Cisco was able to devote so much effort to solve the network problem at CareGroup. The CIO at CareGroup, Dr. Halamka, learned many valuable lessons from the experience. Many of the lessons have to do with the absolute critical nature of networks, their infrastructure, and the need to devote significant human and material resources to maintain them. This event also demonstrates the need for redundancy in network infrastructure and the importance of having a disaster plan—in this case backup paper-based systems. I applaud Dr. Halamka in his efforts at transparency in reporting on the event so that others can learn from it and prevent a similar network collapse at other medical centers. Reference: McFarlan, FW, & Austin, R (2003), CareGroup. Harvard Business Review, 303-097.