ISS CRITICAL INCIDENT REPORT INCIDENT DETAILS Brief description: Servers in the DMZ which are configured in VMWare were inaccessible during the incident. The servers affected were Agresso and the service desk. Physical DMZ servers were not impacted. Start date and time: 23/7/2013 13:00 End date and time: 23/7/2013 Date of this report: 16 Oct 2013 13:30 Incident Number: how to get thisCIR-1307-0001 INCIDENT SUMMARY A vendor was demonstrating a technology which was due for implementation in the new research building cluster. A mis-configuration by the vendor of a piece of hardware resulted in the routing table within the DMZ network becoming corrupted. This corruption only affected DMZ servers configured on VmWare. The result of this corruption was that traffic sent to VmWare servers was unable to find a return path and the connection failed. INCIDENT ANALYSIS How and when did ISS find out about the incident? Service desk application stopped working as this application is in the DMZ. This alerted ISS to the problem. Which service(s) were affected? Service Agresso DMZ servers configured in VmWare What locations were affected? Campus-wide What was the user impact? Users were unable to access servers in the DMZ which were configured on VmWare. How was service restored? Initially service was restored by removing the misconfigured hardware from the network. The mis-configuration was then corrected on the hardware. It was sub-sequentially connected to the network without any issues. If service restoration was a work round how & when will permanent solutions be effected? N/A What is needed to prevent a recurrence? Once the mis-configuration was corrected, there was no re-occurrence. Are we currently at risk of recurrence or other service disruption following the incident? In what way? What needs to be done to mitigate the risk? Numerous vendors have trusted access to hardware and this introduces a risk of disruption when any work is being undertaken. ROOT CAUSE A piece of hardware was mis-configured by a vendor. A piece of hardware was installed on the network by with one end of the connection in the DMZ Vrf(core side) and one end of the connection in the default VRF(hardware side). Both sides of the connection should have been in the DMZ Vrf. This caused the global routing table to be injected into the DMZ vrf routing table. When this happened, connections into the DMZ servers did not route back from the DMZ correctly. FOLLOW UP ACTIONS AND RECOMMENDATIONS Route filters could be configured on the core to minimize the possibility of default routes being injected into the DMZ routing table. External consultancy will be required and an estimate of the cost is €5,000. This should be done to minimize the risk of an re-occurrence. LESSONS LEARNED The addition of new buildings within the campus and changes to the campus routinely install hardware and require configuration of a similar nature.This issue was as a result of misconfiguration by a vendor on a routine piece of hardware installation. It may be necessary for ISS staff to shadow Vendors more closely when working onsite.