ISS CRITICAL INCIDENT REPORT

advertisement
ISS
CRITICAL INCIDENT REPORT
INCIDENT DETAILS
 Brief description:
Servers in the DMZ which are configured in VMWare were inaccessible during the incident.
The servers affected were Agresso and the service desk.
Physical DMZ servers were not impacted.

Start date and time: 23/7/2013 13:00

End date and time: 23/7/2013

Date of this report: 16 Oct 2013
13:30

Incident Number: how to get thisCIR-1307-0001
INCIDENT SUMMARY
A vendor was demonstrating a technology which was due for implementation in the new
research building cluster. A mis-configuration by the vendor of a piece of hardware resulted in
the routing table within the DMZ network becoming corrupted. This corruption only affected
DMZ servers configured on VmWare. The result of this corruption was that traffic sent to
VmWare servers was unable to find a return path and the connection failed.
INCIDENT ANALYSIS
 How and when did ISS find out about the incident?
Service desk application stopped working as this application is in the DMZ. This alerted ISS
to the problem.
 Which service(s) were affected?
Service
Agresso
DMZ servers configured in VmWare
 What locations were affected?
Campus-wide

What was the user impact?
Users were unable to access servers in the DMZ which were configured on VmWare.
 How was service restored?
Initially service was restored by removing the misconfigured hardware from the network.
The mis-configuration was then corrected on the hardware. It was sub-sequentially
connected to the network without any issues.

If service restoration was a work round how & when will permanent solutions be
effected?
N/A
 What is needed to prevent a recurrence?
Once the mis-configuration was corrected, there was no re-occurrence.

Are we currently at risk of recurrence or other service disruption following the incident?
In what way? What needs to be done to mitigate the risk?
Numerous vendors have trusted access to hardware and this introduces a risk of disruption
when any work is being undertaken.
ROOT CAUSE
A piece of hardware was mis-configured by a vendor.
A piece of hardware was installed on the network by with one end of the connection in the
DMZ Vrf(core side) and one end of the connection in the default VRF(hardware side). Both
sides of the connection should have been in the DMZ Vrf.
This caused the global routing table to be injected into the DMZ vrf routing table. When this
happened, connections into the DMZ servers did not route back from the DMZ correctly.
FOLLOW UP ACTIONS AND RECOMMENDATIONS
Route filters could be configured on the core to minimize the possibility of default routes
being injected into the DMZ routing table.
External consultancy will be required and an estimate of the cost is €5,000. This should be
done to minimize the risk of an re-occurrence.
LESSONS LEARNED
The addition of new buildings within the campus and changes to the campus routinely install
hardware and require configuration of a similar nature.This issue was as a result of
misconfiguration by a vendor on a routine piece of hardware installation. It may be necessary
for ISS staff to shadow Vendors more closely when working onsite.
Download